Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Shipman, Galen M. (gshipman_at_[hidden])
Date: 2007-11-07 23:27:47


The lengths we go to avoid progress :-)

On 11/7/07 10:19 PM, "Richard Graham" <rlgraham_at_[hidden]> wrote:

> The real problem, as you and others have pointed out is the lack of
> predictable time slices for the progress engine to do its work, when relying
> on the ULP to make calls into the library...
>
> Rich
>
>
> On 11/8/07 12:07 AM, "Brian Barrett" <brbarret_at_[hidden]> wrote:
>
>> As it stands today, the problem is that we can inject things into the
>> BTL successfully that are not injected into the NIC (due to software
>> flow control). Once a message is injected into the BTL, the PML marks
>> completion on the MPI request. If it was a blocking send that got
>> marked as complete, but the message isn't injected into the NIC/NIC
>> library, and the user doesn't re-enter the MPI library for a
>> considerable amount of time, then we have a problem.
>>
>> Personally, I'd rather just not mark MPI completion until a local
>> completion callback from the BTL. But others don't like that idea, so
>> we came up with a way for back pressure from the BTL to say "it's not
>> on the wire yet". This is more complicated than just not marking MPI
>> completion early, but why would we do something that helps real apps
>> at the expense of benchmarks? That would just be silly!
>>
>> Brian
>>
>> On Nov 7, 2007, at 7:56 PM, Richard Graham wrote:
>>
>>> Does this mean that we don¹t have a queue to store btl level
>>> descriptors that
>>> are only partially complete ? Do we do an all or nothing with
>>> respect to btl
>>> level requests at this stage ?
>>>
>>> Seems to me like we want to mark things complete at the MPI level
>>> ASAP, and
>>> that this proposal is not to do that ­ is this correct ?
>>>
>>> Rich
>>>
>>>
>>> On 11/7/07 11:26 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>>>
>>>> On Nov 7, 2007, at 9:33 PM, Patrick Geoffray wrote:
>>>>
>>>>>> Remember that this is all in the context of Galen's proposal for
>>>>>> btl_send() to be able to return NOT_ON_WIRE -- meaning that the
>>>> send
>>>>>> was successful, but it has not yet been sent (e.g., openib BTL
>>>>>> buffered it because it ran out of credits).
>>>>>
>>>>> Sorry if I miss something obvious, but why does the PML has to be
>>>>> aware
>>>>> of the flow control situation of the BTL ? If the BTL cannot send
>>>>> something right away for any reason, it should be the
>>>> responsibility
>>>>> of
>>>>> the BTL to buffer it and to progress on it later.
>>>>
>>>>
>>>> That's currently the way it is. But the BTL currently only has the
>>>> option to say two things:
>>>>
>>>> 1. "ok, done!" -- then the PML will think that the request is
>>>> complete
>>>> 2. "doh -- error!" -- then the PML thinks that Something Bad
>>>> Happened(tm)
>>>>
>>>> What we really need is for the BTL to have a third option:
>>>>
>>>> 3. "not done yet!"
>>>>
>>>> So that the PML knows that the request is not yet done, but will
>>>> allow
>>>> other things to progress while we're waiting for it to complete.
>>>> Without this, the openib BTL currently replies "ok, done!", even when
>>>> it has only buffered a message (rather than actually sending it out).
>>>> This optimization works great (yeah, I know...) except for apps that
>>>> don't dip into the MPI library frequently. :-\
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel