Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] why mx_forget in mca_btl_mx_prepare_dst?
From: Scott Atchley (atchley_at_[hidden])
Date: 2009-10-21 16:02:01

On Oct 21, 2009, at 3:32 PM, Brice Goglin wrote:

> George Bosilca wrote:
>> On Oct 21, 2009, at 13:42 , Scott Atchley wrote:
>>> On Oct 21, 2009, at 1:25 PM, George Bosilca wrote:
>>>> Because MX doesn't provide a real RMA protocol, we created a fake
>>>> one on top of point-to-point. The two peers have to agree on a
>>>> unique tag, then the receiver posts it before the sender starts the
>>>> send. However, as this is integrated with the real RMA protocol,
>>>> where only one side knows about the completion of the RMA
>>>> operation,
>>>> we still exchange the ACK at the end. Therefore, the receiver
>>>> doesn't need to know when the receive is completed, as it will get
>>>> an ACK from the sender. At least this was the original idea.
>>>> But I can see how this might fails if the short ACK from the sender
>>>> manage to pass the RMA operation on the wire. I was under the
>>>> impression (based on the fact that MX respect the ordering) that
>>>> the
>>>> mx_send will trigger the completion only when all data is on the
>>>> wire/nic memory so I supposed there is _absolutely_ no way for the
>>>> ACK to bypass the last RMA fragments and to reach the receiver
>>>> before the recv is really completed. If my supposition is not
>>>> correct, then we should remove the mx_forget and make sure the that
>>>> before we mark a fragment as completed we got both completions (the
>>>> one from mx_recv and the remote one).
>>> When is the ACK sent? After the "PUT" completion returns (via
>>> mx_test(), etc) or simply after calling mx_isend() for the "PUT" but
>>> before the completion?
>> The ACK is sent by the PML layer. If I'm not mistaken, it is sent
>> when
>> the completion callback is triggered, which should happen only when
>> the MX BTL detect the completion of the mx_isend (using the mx_test).
>> Therefore, I think the ACK is sent in response to the completion of
>> the mx_isend.
> Before or after mx_test() doesn't actually matter if it's a
> small/medium. Even if the send(PUT) completes in mx_test(), the data
> could still be on the wire in case of packet loss or so: if it's a
> tiny/small/medium message (it's was a medium in my crash), the MX lib
> opportunistically completes the request on the sender before it's
> actually acked by the receiver. Matching is in order, request
> completion
> is not. There's no strong delivery guarantee here.
> Brice

Yes, I was thinking of the rendezvous case (>32 kB) only.