Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Device failover on ob1
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-08-04 06:50:01


Could you get together off-list to discuss the different approaches
and see if/where there is common ground. It would be nice to see an
integrated solution - personally, I would rather not see two
orthogonal approaches unless they can be cleanly separated. Much
better if they could support each other in an intelligent fashion.

On Aug 3, 2009, at 9:49 AM, Pavel Shamis (Pasha) wrote:

>> I have not, but there should be no difference. The failover code
>> only gets triggered when an error happens. Otherwise, there are no
>> differences in the code paths while everything is functioning
>> normally.
> Sounds good. I still did not have time to review the code. I will
> try to do it during this week.
> Pasha
>> Rolf
>> On 08/03/09 11:14, Pavel Shamis (Pasha) wrote:
>>> Rolf,
>>> Did you compare latency/bw for failover-enabled code VS trunk ?
>>> Pasha.
>>> Rolf Vandevaart wrote:
>>>> Hi folks:
>>>> As some of you know, I have also been looking into implementing
>>>> failover as well. I took a different approach as I am solving
>>>> the problem within the openib BTL itself. This of course means
>>>> that this only works for failing from one openib BTL to another
>>>> but that was our area of interest. This also means that we do
>>>> not need to keep track of fragments as we get them back from the
>>>> completion queue upon failure. We then extract the relevant
>>>> information and repost on the other working endpoint.
>>>> My work has been progressing at
>>>> .
>>>> This only currently works for send semantics so you have to run
>>>> with -mca btl_openib_flags 1.
>>>> Rolf
>>>> On 07/31/09 05:49, Mouhamed Gueye wrote:
>>>>> Hi list,
>>>>> Here is an update on our work concerning device failover.
>>>>> As many of you suggested, we reoriented our work on ob1 rather
>>>>> than dr and we now have a working prototype on top of ob1. The
>>>>> approach is to store btl descriptors sent to peers and delete
>>>>> them when we receive proof of delivery. So far, we rely on
>>>>> completion callback functions, assuming that the message is
>>>>> delivered when the completion function is called, that is the
>>>>> case of openib. When a btl module fails, it is removed from the
>>>>> endpoint's btl list and the next one is used to retransmit
>>>>> stored descriptors. No extra-message is transmitted, it only
>>>>> consists in additions to the header. It has been mainly tested
>>>>> with two IB modules, in both multi-rail (two separate networks)
>>>>> and multi-path (a big unique network).
>>>>> You can grab and test the patch here (applies on top of the
>>>>> trunk) :
>>>>> To compile with failover support, just define --enable-device-
>>>>> failover at configure. You can then run a benchmark, disconnect
>>>>> a port and see the failover operate.
>>>>> A little latency increase (~ 2%) is induced by the failover
>>>>> layer when no failover occurs. To accelerate the failover
>>>>> process on openib, you can try to lower the
>>>>> btl_openib_ib_timeout openib parameter to 15 for example instead
>>>>> of 20 (default value).
>>>>> Mouhamed
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
> _______________________________________________
> devel mailing list
> devel_at_[hidden]