Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] collective problems
From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-11-08 03:55:50

On Wed, Nov 07, 2007 at 01:16:04PM -0500, George Bosilca wrote:
> On Nov 7, 2007, at 12:51 PM, Jeff Squyres wrote:
>>> The same callback is called in both cases. In the case that you
>>> described, the callback is called just a little bit deeper into the
>>> recursion, when in the "normal case" it will get called from the
>>> first level of the recursion. Or maybe I miss something here ...
>> Right -- it's not the callback that is the problem. It's when the
>> recursion is unwound and further up the stack you now have a stale
>> request.
> That's exactly the point that I fail to see. If the request is freed in the
> PML callback, then it should get release in both cases, and therefore lead
> to problems all the time. Which, obviously, is not true when we do not have
> this deep recursion thing going on.
> Moreover, he request management is based on the reference count. The PML
> level have one ref count and the MPI level have another one. In fact, we
> cannot release a request until we explicitly call ompi_request_free on it.
> The place where this call happens is different between the blocking and non
> blocking calls. In the non blocking case the ompi_request_free get called
> from the *_test (*_wait) functions while in the blocking case it get called
> directly from the MPI_Send function.
> Let me summarize: a request cannot reach a stale state without a call to
> ompi_request_free. This function is never called directly from the PML
> level. Therefore, the recursion depth should not have any impact on the
> state of the request !

I looked at the code one more time and it seems to me now that George is
absolutely right. The scenario I described cannot happen because we call
ompi_request_free() at the top of the stack. I somehow had an
impression that we mark internal requests as freed before calling
send(). So I'll go and implement NOT_ON_WIRE extension when I'll have
time for it.