As some of you know, I have also been looking into implementing failover
as well. I took a different approach as I am solving the problem within
the openib BTL itself. This of course means that this only works for
failing from one openib BTL to another but that was our area of
interest. This also means that we do not need to keep track of
fragments as we get them back from the completion queue upon failure.
We then extract the relevant information and repost on the other working
My work has been progressing at http://bitbucket.org/rolfv/ompi-failover.
This only currently works for send semantics so you have to run with
-mca btl_openib_flags 1.
On 07/31/09 05:49, Mouhamed Gueye wrote:
> Hi list,
> Here is an update on our work concerning device failover.
> As many of you suggested, we reoriented our work on ob1 rather than dr
> and we now have a working prototype on top of ob1. The approach is to
> store btl descriptors sent to peers and delete them when we receive
> proof of delivery. So far, we rely on completion callback functions,
> assuming that the message is delivered when the completion function is
> called, that is the case of openib. When a btl module fails, it is
> removed from the endpoint's btl list and the next one is used to
> retransmit stored descriptors. No extra-message is transmitted, it only
> consists in additions to the header. It has been mainly tested with two
> IB modules, in both multi-rail (two separate networks) and multi-path (a
> big unique network).
> You can grab and test the patch here (applies on top of the trunk) :
> To compile with failover support, just define --enable-device-failover
> at configure. You can then run a benchmark, disconnect a port and see
> the failover operate.
> A little latency increase (~ 2%) is induced by the failover layer when
> no failover occurs. To accelerate the failover process on openib, you
> can try to lower the btl_openib_ib_timeout openib parameter to 15 for
> example instead of 20 (default value).
> devel mailing list