Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Device failover on ob1
From: Pavel Shamis (Pasha) (pashash_at_[hidden])
Date: 2009-08-03 11:49:44

> I have not, but there should be no difference. The failover code only
> gets triggered when an error happens. Otherwise, there are no
> differences in the code paths while everything is functioning normally.
Sounds good. I still did not have time to review the code. I will try to
do it during this week.

> Rolf
> On 08/03/09 11:14, Pavel Shamis (Pasha) wrote:
>> Rolf,
>> Did you compare latency/bw for failover-enabled code VS trunk ?
>> Pasha.
>> Rolf Vandevaart wrote:
>>> Hi folks:
>>> As some of you know, I have also been looking into implementing
>>> failover as well. I took a different approach as I am solving the
>>> problem within the openib BTL itself. This of course means that
>>> this only works for failing from one openib BTL to another but that
>>> was our area of interest. This also means that we do not need to
>>> keep track of fragments as we get them back from the completion
>>> queue upon failure. We then extract the relevant information and
>>> repost on the other working endpoint.
>>> My work has been progressing at
>>> This only currently works for send semantics so you have to run with
>>> -mca btl_openib_flags 1.
>>> Rolf
>>> On 07/31/09 05:49, Mouhamed Gueye wrote:
>>>> Hi list,
>>>> Here is an update on our work concerning device failover.
>>>> As many of you suggested, we reoriented our work on ob1 rather than
>>>> dr and we now have a working prototype on top of ob1. The approach
>>>> is to store btl descriptors sent to peers and delete them when we
>>>> receive proof of delivery. So far, we rely on completion callback
>>>> functions, assuming that the message is delivered when the
>>>> completion function is called, that is the case of openib. When a
>>>> btl module fails, it is removed from the endpoint's btl list and
>>>> the next one is used to retransmit stored descriptors. No
>>>> extra-message is transmitted, it only consists in additions to the
>>>> header. It has been mainly tested with two IB modules, in both
>>>> multi-rail (two separate networks) and multi-path (a big unique
>>>> network).
>>>> You can grab and test the patch here (applies on top of the trunk) :
>>>> To compile with failover support, just define
>>>> --enable-device-failover at configure. You can then run a
>>>> benchmark, disconnect a port and see the failover operate.
>>>> A little latency increase (~ 2%) is induced by the failover layer
>>>> when no failover occurs. To accelerate the failover process on
>>>> openib, you can try to lower the btl_openib_ib_timeout openib
>>>> parameter to 15 for example instead of 20 (default value).
>>>> Mouhamed
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]