Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Device failover on ob1
From: Brian Barrett (brbarret_at_[hidden])
Date: 2009-08-02 00:55:04

While I agree that performance impact (latency in this case) is
important, I disagree that this necessarily belongs somewhere other
than ob1. For example, a zero-performance impact solution would be to
provide two versions of all the interface functions, one with failover
turned on and one with it turned off, and select the appropriate
functions at initialization time. There are others, including careful
placement of decision logic, which are likely to result in near-zero
impact. I'm not attempting to prescribe a solution, but refuting the
claim that this can't be in ob1 - I think more data is needed before
such a claim is made.

Mouhamed - can the openib btl try to re-establish a connection between
two peers today (with your ob1 patches, obviously)? Would this allow
us to adapt to changing routes due to switch failures (assuming that
there are other physical routes around the failed switch, of course)?



On Aug 1, 2009, at 6:21 PM, Graham, Richard L. wrote:

> What is the impact on sm, which is by far the most sensitive to
> latency. This really belongs in a place other than ob1. Ob1 is
> supposed to provide the lowest latency possible, and other pml's are
> supposed to be used for heavier weight protocols.
> On the technical side, how do you distinguish between a lot
> acknowledgement and an undelivered message ? You really don't want
> to try and deliver data into user space twice, as once a receive is
> complete, who knows what the user has done with that buffer ? A
> general treatment needs to be able to false negatives, and attempts
> to deliver the data more than once.
> How are you detecting missing acknowledgements ? Are you using some
> sort of timer ?
> Rich
> On 7/31/09 5:49 AM, "Mouhamed Gueye" <mouhamed.gueye_at_[hidden]> wrote:
> Hi list,
> Here is an update on our work concerning device failover.
> As many of you suggested, we reoriented our work on ob1 rather than dr
> and we now have a working prototype on top of ob1. The approach is to
> store btl descriptors sent to peers and delete them when we receive
> proof of delivery. So far, we rely on completion callback functions,
> assuming that the message is delivered when the completion function is
> called, that is the case of openib. When a btl module fails, it is
> removed from the endpoint's btl list and the next one is used to
> retransmit stored descriptors. No extra-message is transmitted, it
> only
> consists in additions to the header. It has been mainly tested with
> two
> IB modules, in both multi-rail (two separate networks) and multi-
> path (a
> big unique network).
> You can grab and test the patch here (applies on top of the trunk) :
> To compile with failover support, just define --enable-device-failover
> at configure. You can then run a benchmark, disconnect a port and see
> the failover operate.
> A little latency increase (~ 2%) is induced by the failover layer when
> no failover occurs. To accelerate the failover process on openib, you
> can try to lower the btl_openib_ib_timeout openib parameter to 15 for
> example instead of 20 (default value).
> Mouhamed
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> _______________________________________________
> devel mailing list
> devel_at_[hidden]

   Brian Barrett
   Open MPI developer