Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] why mx_forget in mca_btl_mx_prepare_dst?
From: Scott Atchley (atchley_at_[hidden])
Date: 2009-10-21 13:42:50

On Oct 21, 2009, at 1:25 PM, George Bosilca wrote:

> Brice,
> Because MX doesn't provide a real RMA protocol, we created a fake
> one on top of point-to-point. The two peers have to agree on a
> unique tag, then the receiver posts it before the sender starts the
> send. However, as this is integrated with the real RMA protocol,
> where only one side knows about the completion of the RMA operation,
> we still exchange the ACK at the end. Therefore, the receiver
> doesn't need to know when the receive is completed, as it will get
> an ACK from the sender. At least this was the original idea.
> But I can see how this might fails if the short ACK from the sender
> manage to pass the RMA operation on the wire. I was under the
> impression (based on the fact that MX respect the ordering) that the
> mx_send will trigger the completion only when all data is on the
> wire/nic memory so I supposed there is _absolutely_ no way for the
> ACK to bypass the last RMA fragments and to reach the receiver
> before the recv is really completed. If my supposition is not
> correct, then we should remove the mx_forget and make sure the that
> before we mark a fragment as completed we got both completions (the
> one from mx_recv and the remote one).


When is the ACK sent? After the "PUT" completion returns (via
mx_test(), etc) or simply after calling mx_isend() for the "PUT" but
before the completion?

If the former, the ACK cannot pass the data. If the latter, it is
easily possible especially if there is a lot of contention (and thus a
lot of route dispersion).

MX only guarantees order of matching (two identical tags will match in
order), not order of completion.