Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Caitlin Bestler (caitlinb_at_[hidden])
Date: 2007-05-09 18:03:08

devel-bounces_at_[hidden] wrote:
> Steve Wise wrote:
>> There have been a series of discussions on the ofa general list about
>> this issue, and the conclusion to date is that it cannot be resolved
>> in the rdma-cm or iwarp-cm code of the linux rdma stack. Mainly
>> because sending an RDMA message involves the ULP's work queue and
>> completion queue, so the CM cannot do this under the covers in a
>> mannor that doesn't affect the application. Thus, the applications
>> must deal with this.
> Why can't uDAPL deal with this? As a uDAPL user, I really
> don't care what API uDAPL is using under the hood to move
> data from one place to another, nor the quirks of that API.
> The whole point of uDAPL is to form a network-agnostic
> abstraction layer. AFAIK, the uDAPL spec doesn't enforce any
> such requirement on RDMA communication either. In my
> opinion, exposing such behavior above uDAPL is incorrect and
> is part of why uDAPL has seen limited adoption -- every
> single uDAPL implementation behaves in different ways, making
> it extremely difficult to write an application to work on any
> uDAPL implementation. Sorry if this sounds harsh, but this
> comes from many hours of banging my head on the wall due to
> working around these sorts of problems :)

The simple answer is that uDAPL cannot deal with this.

The RDMAC verbs specification was overly focused on client/server
and therefore did not realize that there was any harm in requiring
that the active side did the first send. But given that DAPL could
not rewrite either the RDMAC or InfiniBand verbs it had to come up
with the best solution that matched the verbs as they were. One of
the explicit ground rules was that DAPL MUST support all RDMA devices
that were IBTA or RDMAC compliant. Given those rules, if the active
side does not send a message the passive side might be held off
indefinitely, and sending a message cause consumption of a receive
buffer and therefore cannot be transparent to the uDAPL consumer.

Given those constraints there is literally nothing that can be
done to work around this problem by either DAPL or OFA.