Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] multi-rail failover with IB
From: Robin Humble (rjh+openmpi_at_[hidden])
Date: 2008-04-02 01:13:52


Hi,

from reading the FAQ and this list it seems OpenMPI can use multiple
InfiniBand rails by round-robining across the ports out of each node (as
long as they're configured to be on separate subnets (I think)).

can OpenMPI also deal with one of the subnets failing?
ie. will OpenMPI automatically fall back to using the last remaining
working IB port out of a node, or even fallback to GigE if all the IB
fails?

the reason I ask is that we're worried about switches failing in the IB
network and whether OpenMPI can solve all our problems for us if we
configure up 2 or more independent IB networks out of each node.

possibly this sort of failover in the MPI isn't needed with ConnectX as
long as it's adaptive routing works as advertised? If so then I guess
it's not that important, and I wouldn't want to make you guys do a lot
of unecessary work :-)

the FAQ entry here:
  http://www.open-mpi.org/faq/?category=ft#ft-future
says
  - Data Reliability and network fault tolerance. Similar to those
    implemented in LA-MPI
but I don't actually know what LA-MPI implemented in this area, so that
doesn't really help me.

cheers,
robin