Does this mean that we don’t have a queue to store btl level descriptors that
 are only partially complete ?  Do we do an all or nothing with respect to btl
 level requests at this stage ?

Seems to me like we want to mark things complete at the MPI level ASAP, and
 that this proposal is not to do that – is this correct ?


>> Remember that this is all in the context of Galen's proposal for
>> btl_send() to be able to return NOT_ON_WIRE -- meaning that the send
>> was successful, but it has not yet been sent (e.g., openib BTL
>> buffered it because it ran out of credits).
> Sorry if I miss something obvious, but why does the PML has to be
> aware
> of the flow control situation of the BTL ? If the BTL cannot send
> something right away for any reason, it should be the responsibility
> of
> the BTL to buffer it and to progress on it later.

That's currently the way it is.  But the BTL currently only has the
option to say two things:

1. "ok, done!" -- then the PML will think that the request is complete
2. "doh -- error!" -- then the PML thinks that Something Bad

What we really need is for the BTL to have a third option:

3. "not done yet!"

So that the PML knows that the request is not yet done, but will allow
other things to progress while we're waiting for it to complete.  
Without this, the openib BTL currently replies "ok, done!", even when
it has only buffered a message (rather than actually sending it out).  
This optimization works great (yeah, I know...) except for apps that
don't dip into the MPI library frequently.  :-\

