Sorry for the delay in replying...
On Sep 1, 2009, at 1:11 AM, Shaun Jackman wrote:
> > Looking at the source code of MPI_Request_get_status, it...
> > calls OPAL_CR_NOOP_PROGRESS()
> > returns true in *flag if request->req_complete
> > calls opal_progress()
> > returns false in *flag
Keep in mind that MPI_REQUEST_GET_STATUS is exactly the same as
MPI_TEST except that the MPI_Request will not be deallocated if the
request has completed.
> > What's the difference between OPAL_CR_NOOP_PROGRESS() and
> > opal_progress()? If the request has already completed, does it mean
> > that since opal_progress() is not called, no further progress is
> OPAL_CR_NOOP_PROGRESS() seems to be related to checkpoint/restart and
> is a no-op unless fault-tolerance is being used.
> Two questions then...
> 1. If the request has already completed, does it mean that since
> opal_progress() is not called, no further progress is made?
Correct. It's a latency thing; if your request has already completed,
we just tell you without further delay (i.e., without invoking
opal_progress(), which may trigger lots of other things, and therefore
increase the latency of MPI_REQUEST_GET_STATUS returning).
opal_progress() is our lowest-level progression engine call. It kicks
all kinds of registered progression callbacks from all over the code
> 2. request->req_complete is tested before calling opal_progress(). Is
> it possible that request->req_complete is now true after calling
> opal_progress() when this function returns false in *flag?
Yes. I suppose it could be an optimization to duplicate the block
testing for request->req_complete==true below the call to
opal_progress(). I'm guessing the only reason it wasn't done was to
avoid code duplication. Additionally, the call to opal_progress() is
surrounded by an #if block testing OPAL_ENABLE_PROGRESS_THREADS -- if
we have progress threads enabled, the thought was that opal_progress()
(and friends) would be invoked automatically (and probably
continuously) by other threads. The progression thread code is not
well tested -- I'd be surprised if it worked at all, because I doubt
anyone is testing it -- but it has been in our design since the very
beginning. This is likely another reason we don't test again for
req_complete==true after the call to opal_progress() -- because that
block would need to be protected by that #if, leading to further code