Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] why mx_forget in mca_btl_mx_prepare_dst?
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2009-10-21 15:32:46


George Bosilca wrote:
> On Oct 21, 2009, at 13:42 , Scott Atchley wrote:
>> On Oct 21, 2009, at 1:25 PM, George Bosilca wrote:
>>> Because MX doesn't provide a real RMA protocol, we created a fake
>>> one on top of point-to-point. The two peers have to agree on a
>>> unique tag, then the receiver posts it before the sender starts the
>>> send. However, as this is integrated with the real RMA protocol,
>>> where only one side knows about the completion of the RMA operation,
>>> we still exchange the ACK at the end. Therefore, the receiver
>>> doesn't need to know when the receive is completed, as it will get
>>> an ACK from the sender. At least this was the original idea.
>>>
>>> But I can see how this might fails if the short ACK from the sender
>>> manage to pass the RMA operation on the wire. I was under the
>>> impression (based on the fact that MX respect the ordering) that the
>>> mx_send will trigger the completion only when all data is on the
>>> wire/nic memory so I supposed there is _absolutely_ no way for the
>>> ACK to bypass the last RMA fragments and to reach the receiver
>>> before the recv is really completed. If my supposition is not
>>> correct, then we should remove the mx_forget and make sure the that
>>> before we mark a fragment as completed we got both completions (the
>>> one from mx_recv and the remote one).
>>
>> When is the ACK sent? After the "PUT" completion returns (via
>> mx_test(), etc) or simply after calling mx_isend() for the "PUT" but
>> before the completion?
>
> The ACK is sent by the PML layer. If I'm not mistaken, it is sent when
> the completion callback is triggered, which should happen only when
> the MX BTL detect the completion of the mx_isend (using the mx_test).
> Therefore, I think the ACK is sent in response to the completion of
> the mx_isend.

Before or after mx_test() doesn't actually matter if it's a
small/medium. Even if the send(PUT) completes in mx_test(), the data
could still be on the wire in case of packet loss or so: if it's a
tiny/small/medium message (it's was a medium in my crash), the MX lib
opportunistically completes the request on the sender before it's
actually acked by the receiver. Matching is in order, request completion
is not. There's no strong delivery guarantee here.

Brice