Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mca_pml_ob1_send blocks
From: Shaun Jackman (sjackman_at_[hidden])
Date: 2009-09-08 13:06:07


Jeff Squyres wrote:
...
>> Two questions then...
>>
>> 1. If the request has already completed, does it mean that since
>> opal_progress() is not called, no further progress is made?
>>
>
> Correct. It's a latency thing; if your request has already completed,
> we just tell you without further delay (i.e., without invoking
> opal_progress(), which may trigger lots of other things, and therefore
> increase the latency of MPI_REQUEST_GET_STATUS returning).
>
> opal_progress() is our lowest-level progression engine call. It kicks
> all kinds of registered progression callbacks from all over the code
> base.
>
>> 2. request->req_complete is tested before calling opal_progress(). Is
>> it possible that request->req_complete is now true after calling
>> opal_progress() when this function returns false in *flag?
>
> Yes. I suppose it could be an optimization to duplicate the block
> testing for request->req_complete==true below the call to
> opal_progress(). I'm guessing the only reason it wasn't done was to
> avoid code duplication. Additionally, the call to opal_progress() is
> surrounded by an #if block testing OPAL_ENABLE_PROGRESS_THREADS -- if
> we have progress threads enabled, the thought was that opal_progress()
> (and friends) would be invoked automatically (and probably
> continuously) by other threads. The progression thread code is not
> well tested -- I'd be surprised if it worked at all, because I doubt
> anyone is testing it -- but it has been in our design since the very
> beginning. This is likely another reason we don't test again for
> req_complete==true after the call to opal_progress() -- because that
> block would need to be protected by that #if, leading to further code
> complexity.

Hi Jeff,

I can see one sort of ugly scenario unfolding in my head. Consider two
processes running the following pseudocode:

req = MPI_Irecv
while (!done) {
   while (MPI_Test(req)) {
     req = MPI_Irecv
   }
   MPI_Send(!me)
   MPI_Send(!me)
}

I'll describe one process here:
* MPI_Test checks req->req_complete, which is false, then calls
opal_progress (which finds two packets from the other guy).
* Send two packets to the other guy.

* MPI_Test checks req->req_complete, which is true, returns
immediately. No progress is made.
* MPI_Test checks req->req_complete, which is false, because no
progress has been made since the last call. Call opal_progress (which
finds two packets from the other guy).
* Send two packets to the other guy.

* MPI_Test checks req->req_complete, which is true, returns
immediately. No progress is made.
* MPI_Test checks req->req_complete, which is false, because no
progress has been made since the last time. Call opal_progress (which
finds two packets from the other guy).
* Send two packets to the other guy.

and loop.

In each iteration through the loop, one packet is received and two
packets are sent. Eventually this has to end badly.

Following is an untested fix to request_get_status.c. It checks
req->req_complete and returns immediately if it is true. If not, it
calls opal_progress() and checks req->req_complete again. If
OMPI_ENABLE_PROGRESS_THREADS is defined, it only checks the once and
does not call opal_progress(). It would look better if the body of the
loop were factored out into its own function.

Cheers,
Shaun

        int i;
        for (i = 0; i < 2; i++) {
                if( request->req_complete ) {
                        *flag = true;
                        /* If this is a generalized request, we *always* have to call
                           the query function to get the status (MPI-2:8.2), even if
                           the user passed STATUS_IGNORE. */
                        if (OMPI_REQUEST_GEN == request->req_type) {
                                ompi_grequest_invoke_query(request, &request->req_status);
                        }
                        if (MPI_STATUS_IGNORE != status) {
                                *status = request->req_status;
                        }
                        return MPI_SUCCESS;
                }
#if OMPI_ENABLE_PROGRESS_THREADS == 0
                if (i == 0)
                        opal_progress();
#else
                break;
#endif
        }
        *flag = false;
        return MPI_SUCCESS;