Shaun Jackman wrote:
> Jeff Squyres wrote:
>> On Aug 26, 2009, at 10:38 AM, Jeff Squyres (jsquyres) wrote:
>>> Yes, this could cause blocking. Specifically, the receiver may not
>>> advance any other senders until the matching Irecv is posted and is
>>> able to make progress.
>> I should clarify something else here -- for long messages where the
>> pipeline protocol is used, OB1 may need to be invoked repeatedly to
>> keep making progress on all the successive fragments. I.e., if a send
>> is long enough to entail many fragments, then OB1 may (read: likely
>> will) not progress *all* of them simultaneously. Hence, if you're
>> calling MPI_Test(), for example, to kick the progress engine, you may
>> have to call it a few times to get *all* the fragments processed.
>> How many fragments are processed in each call to progress can depend
>> on the speed of your hardware and network, etc.
> Hi Jeff,
> Looking at the source code of MPI_Request_get_status, it...
> calls OPAL_CR_NOOP_PROGRESS()
> returns true in *flag if request->req_complete
> calls opal_progress()
> returns false in *flag
> What's the difference between OPAL_CR_NOOP_PROGRESS() and
> opal_progress()? If the request has already completed, does it mean
> that since opal_progress() is not called, no further progress is made?
OPAL_CR_NOOP_PROGRESS() seems to be related to checkpoint/restart and
is a no-op unless fault-tolerance is being used.
Two questions then...
1. If the request has already completed, does it mean that since
opal_progress() is not called, no further progress is made?
2. request->req_complete is tested before calling opal_progress(). Is
it possible that request->req_complete is now true after calling
opal_progress() when this function returns false in *flag?