On Mon, 8 Jun 2009, NiftyOMPI Tom Mitchell wrote:
> ??? dual rail does double the number of switch ports. If you want to
> address switch failure each rail must connect to a different switch.
> If you do not want to have isolated fabrics you must have some
> additional ports on all switches to connect the two fabrics and enough
> of them to maintain sufficient bandwidth and connectivity when a switch
> fails. Thus, You are doubling the fabric unless I am missing something.
Well, it is pretty much research for now. But yes, we want each port to be
connected to a different switch so that both cable and switch failures can
Open MPI currently needs to have connected fabrics, but maybe that's
something we will like to change in the future, having two separate rails.
(Btw Pasha, will your current work enable this ?)
> Is your second set of switches so minimally connected that the second
> tree can be installed with a small switch count.
That's the idea, yes. For example, you could have a primary QDR fat-tree
network and a failover non fat-tree DDR one (potentially recycled from a
> What are the odds when port 1 fails that port 2 is going to
> be live. Cable/ connector errors would be the most likely
> case where port 2 would be live. In general if port 1 fails
> I would expect port 2 to have issues too.
Well, depending on the errors you want to be able to survive, you may have
2 cards, in which case there is no reason why port1 failure would cause
port2 to fail too. But in all cases, switches and cable errors are a
concern to us.