Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Galen Shipman (gshipman_at_[hidden])
Date: 2007-06-07 11:55:11


On Jun 7, 2007, at 9:11 AM, George Bosilca wrote:

> There is something weird with this change, and the patch reflect
> it. The new argument "order" come from the PML level and might be
> MCA_BTL_NO_ORDER (which is kind of global) or BTL_OPENIB_LP_QP or
> BTL_OPENIB_HP_QP (which are definitively Open IB related). Do you
> really intend to let the PML knows about Open IB internal constants ?

No, the PML knows only one thing about the order tag, it is either
MCA_BTL_NO_ORDER or it is something that the BTL assigns.
The PML has no idea about BTL_OPENIB_LP_QP or BTL_OPENIB_HP_QP, to
the PML it is just an order tag assigned to a fragment by the BTL.

So the semantics are that after a btl_send/put/get an order tag may
be assigned by the BTL to the descriptor,
This order tag can then be specified to subsequent calls to btl_alloc
or btl_prepare. The PML has no idea what the value means, other than
he is requesting a descriptor that will be ordered w.r.t. a
previously transmitted descriptor.

> If it's the case (which seems to be true from the following snippet
> if(MCA_BTL_NO_ORDER == order) {
> frag->base.order = BTL_OPENIB_LP_QP;
> } else {
> frag->base.order = order;
> }
So I am choosing some ordering to use here because the PML told me he
doesn't care, what is wrong with this?

> ) I expect you to revise the patch in order to propose a generic
> solution or I'll trigger a vote against the patch.
This exports no knowledge of the Open IB BTL to the PML layer, the
PML doesn't know that this is a QP index, he doesn't care! The PML
simply uses this value (if it wants to) to request ordering with
subsequent fragments. We use the QP index only as a BTL optimization,
it could have been anything. So the only new knowledge that the PML
has is how to request that ordering of fragments be enforced, and the
BTL doesn't even have to provide this if it doesn't want, that is the
reason for MCA_BTL_NO_ORDER.

Please describe a use case where this is not a generic solution. Keep
in mind that MX, TCP, GM all can provide ordering guarantees if they
wish, in fact for MX you can simply always assign an order tag, say
the value is 1. MX can then guarantee ordering of all fragments sent
over the same BTL.

> I vote to be backed out of the trunk as it export way to much
> knowledge from the Open IB BTL into the PML layer.

The only other option that I have identified that doesn't push PML
level protocol into the BTL is to require that BTLs always guarantee
ordering of fragments sent/put/get over the same BTL.

>
> george.
>
> PS: With Gleb changes the problem is the same. The following
> snippet reflect exactly the same behavior as the original patch.

Gleb's changes don't change the semantic guarantees that I have
described above.

>
> frag->base.order = order;
> assert(frag->base.order != BTL_OPENIB_HP_QP);
>
> On Jun 7, 2007, at 9:49 AM, Gleb Natapov wrote:
>
>> Hi Galen,
>>
>> On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote:
>>>
>>>> With current code this is not the case. Order tag is set during a
>>>> fragment
>>>> allocation. It seems wrong according to your description. Attached
>>>> patch fixes
>>>> this. If no specific ordering tag is provided to allocation
>>>> function order of
>>>> the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/
>>>> get order
>>>> is set to whatever QP was used for communication. If order is set
>>>> before send call
>>>> it is used to choose QP.
>>>>
>>>
>>> I do set the order tag during allocation/prepare, but the defined
>>> semantics are that the tag is only valid after send/put/get. We can
>>> set them up any where we wish in the BTL, the PML however cannot
>>> rely
>>> on anything until after the send/put/get call. So really this is an
>>> issue of semantics versus implementation. The implementation I
>>> believe does conform to the semantics as the upper layer (PML)
>>> doesn't use the tag value until after a call to send/put/get.
>>>
>>> I will look over the patch however, might make more sense to delay
>>> setting the value until the actual send/put/get call.
>>>
>> Have you had a chance to look over the patch?
>>
>> --
>> Gleb.
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel