Here is an update on our work concerning device failover.
As many of you suggested, we reoriented our work on ob1 rather than dr
and we now have a working prototype on top of ob1. The approach is to
store btl descriptors sent to peers and delete them when we receive
proof of delivery. So far, we rely on completion callback functions,
assuming that the message is delivered when the completion function is
called, that is the case of openib. When a btl module fails, it is
removed from the endpoint's btl list and the next one is used to
retransmit stored descriptors. No extra-message is transmitted, it only
consists in additions to the header. It has been mainly tested with two
IB modules, in both multi-rail (two separate networks) and multi-path (a
big unique network).
You can grab and test the patch here (applies on top of the trunk) :
To compile with failover support, just define --enable-device-failover
at configure. You can then run a benchmark, disconnect a port and see
the failover operate.
A little latency increase (~ 2%) is induced by the failover layer when
no failover occurs. To accelerate the failover process on openib, you
can try to lower the btl_openib_ib_timeout openib parameter to 15 for
example instead of 20 (default value).