Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Waitall never returns
From: Ross Boylan (ross_at_[hidden])
Date: 2014-04-04 22:20:27


On 4/4/2014 6:01 PM, Ralph Castain wrote:
> It sounds like you don't have a balance between sends and recvs somewhere - i.e., some apps send messages, but the intended recipient isn't issuing a recv and waiting until the message has been received before exiting. If the recipient leaves before the isend completes, then the isend will never complete and the waitall will not return.
I'm pretty sure the sends complete because I wait on something that can
only be computed after the sends complete, and I know I have that result.

My current theory is that my modifications to Rmpi are not properly
tracking all completed messages, resulting in it thinking there are
outstanding messages (and passing a positive count to the C-level
MPI_Waitall with associated garbagey arrays). But I haven't isolated
the problem.

Ross
>
>
> On Apr 4, 2014, at 5:20 PM, Ross Boylan <ross_at_[hidden]> wrote:
>
>> During shutdown of my application the processes issue a waitall, since they have done some Isends. A couple of them never return from that call.
>>
>> Could this be the result of some of the processes already being shutdown (the processes with the problem were late in the shutdown sequence)? If so, what is the recommended solution? A barrier?
>>
>> The shutdown proceeds in stages, but the processes in question are not told to shutdown until all the messages they have sent have been received. So there shouldn't be any outstanding messages from them.
>>
>> My reading of the manual is that Waitall with a count of 0 should return immediately, not hang. Is that correct?
>>
>> Running under R with openmpi 1.7.4.
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users