Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Andrew Friedley (afriedle_at_[hidden])
Date: 2007-05-09 20:55:52

Steve Wise wrote:
> On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote:
>> Steve Wise wrote:
>>> There have been a series of discussions on the ofa general list about
>>> this issue, and the conclusion to date is that it cannot be resolved in
>>> the rdma-cm or iwarp-cm code of the linux rdma stack. Mainly because
>>> sending an RDMA message involves the ULP's work queue and completion
>>> queue, so the CM cannot do this under the covers in a mannor that
>>> doesn't affect the application. Thus, the applications must deal with
>>> this.
>> Why can't uDAPL deal with this? As a uDAPL user, I really don't care
>> what API uDAPL is using under the hood to move data from one place to
>> another, nor the quirks of that API. The whole point of uDAPL is to
>> form a network-agnostic abstraction layer. AFAIK, the uDAPL spec
>> doesn't enforce any such requirement on RDMA communication either. In
>> my opinion, exposing such behavior above uDAPL is incorrect and is part
>> of why uDAPL has seen limited adoption -- every single uDAPL
>> implementation behaves in different ways, making it extremely difficult
>> to write an application to work on any uDAPL implementation. Sorry if
>> this sounds harsh, but this comes from many hours of banging my head on
>> the wall due to working around these sorts of problems :)
> I understand your frustration. I think the MPA protocol is deficient in
> this respect and should have required the necessary "first FPDU" to be
> sent under the covers by the RNICs. A RTR packet if you will. To
> resolve this issue "properly", in my opinion, would involve changing the
> IETF MPA spec and also breaking all the existing iwarp HW. We can't do
> that.


> The reason it is hard or impossible to solve this in the DAPL layer is
> that any rdma operation on the QP affects the state of that QP and the
> associate CQs. In addition, if you use an RDMA send to enforce this you
> impact the other side by consuming a RECV buffer. So its hard if not
> impossible to do this under the covers without affecting the
> application's resources.

Is there no way to do this before passing connection established events
to the uDAPL consumer? I need to go read up on the uDAPL API to really
understand why this wouldn't work.

> Also, the DAPL specification had a goal to not impose any additional
> protocol on the wire. If you add this under the covers, then you add
> such a "protocol" and break interoperability between a connection
> accessed via DAPL on one end and some other API on the other end.

So I guess there's no 'right' solution, at least at the uDAPL level.
With RDMACM/OFA verbs, there's at least the argument that you can design
the API/semantics however you please, while uDAPL is already standardized.

I hope you guys are documenting this in a way that makes this issue
extremely clear to both uDAPL and OFA verbs (is this the right naming?)
users. Maybe it's been done already, but is it possible to emit some
sort of loud warning/error when the accept()'ing side tries to send
before a receive?