Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mca_pml_ob1_send blocks
From: Shaun Jackman (sjackman_at_[hidden])
Date: 2009-09-14 13:55:35


Hi Jeff,

Jeff Squyres wrote:
> On Sep 8, 2009, at 1:06 PM, Shaun Jackman wrote:
> My INBOX has been a disaster recently. Please ping me repeatedly if
> you need quicker replies (sorry! :-( ).
>
> (btw, should this really be on the devel list, not the user list?)

It's tending that way. I'll keep the thread here for now for
continuity. If I start a new thread on this topic, I'll move it to devel.

>> I can see one sort of ugly scenario unfolding in my head. Consider two
>> processes running the following pseudocode:
>>
>> req = MPI_Irecv
>> while (!done) {
>> while (MPI_Test(req)) {
>> req = MPI_Irecv
>> }
>> MPI_Send(!me)
>> MPI_Send(!me)
>> }
>>
>
> Are the sends guaranteed to have matching receives elsewhere? If not,
> this has potential to deadlock on the whole assuming-buffering issue...

You're right that this is an erroneous program because there is only
one Irecv posted and two Send. Change the two MPI_Send to MPI_Bsend to
prevent deadlock, and the situation I'm describing below still applies.

> If you're expecting the sends to be matched by the Irecv's, this looks
> like an erroneous program to me (there will always be 2x as many sends
> outstanding as receives).
>
>> I'll describe one process here:
>> * MPI_Test checks req->req_complete, which is false, then calls
>> opal_progress (which finds two packets from the other guy).
>> * Send two packets to the other guy.
>>
>
> ...only if they're eager. The sends are *not* guaranteed to complete
> until the matching receives occur.
>
>> * MPI_Test checks req->req_complete, which is true, returns
>> immediately. No progress is made.
>> * MPI_Test checks req->req_complete, which is false, because no
>> progress has been made since the last call. Call opal_progress (which
>> finds two packets from the other guy).
>> * Send two packets to the other guy.
>>
>> * MPI_Test checks req->req_complete, which is true, returns
>> immediately. No progress is made.
>> * MPI_Test checks req->req_complete, which is false, because no
>> progress has been made since the last time. Call opal_progress (which
>> finds two packets from the other guy).
>> * Send two packets to the other guy.
>>
>> and loop.
>>
>> In each iteration through the loop, one packet is received and two
>> packets are sent. Eventually this has to end badly.
>>
>
> Bad user behavior should be punished, yes. :-)
>
> I'm not quite sure that I see the problem you're identifying -- from
> what you describe, I think it's an erroneous program.

With buffered sends, in each iteration through the loop two packets
will be sent and only one packet will be received due to
MPI_Request_get_status not checking request->req_complete after
calling opal_progress.

>> Following is an untested fix to request_get_status.c. It checks
>> req->req_complete and returns immediately if it is true. If not, it
>> calls opal_progress() and checks req->req_complete again. If
>> OMPI_ENABLE_PROGRESS_THREADS is defined, it only checks the once and
>> does not call opal_progress(). It would look better if the body of the
>> loop were factored out into its own function.
>>
>
> Hmm. Do you mean this to be in request_get_status.c or req_test.c?
> (you mentioned MPI_TEST above, not MPI_REQUEST_GET_STATUS)

I meant this code for MPI_Request_get_status. I just read the code for
ompi_request_default_test in req_test.c. It includes code very similar
to what I suggested for checking request->req_complete, calling
opal_progress and then checking request->req_complete a second time,
except that it implements the loop using a goto.

> Is this the optimization I mentioned in my previous reply (i.e., if
> req_complete is false, call opal_progress, and then check req_complete
> again?) If so, I think it would be better to do it without an if loop
> somehow (testing and branching, etc.).

Yes, and MPI_Request_get_status would then behave as MPI_Test does
currently. Is it not so crazy for MPI_Test to be implemented as calls
to MPI_Request_get_status and MPI_Request_free? It would eliminate the
code duplication.

Cheers,
Shaun