Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] TCP BTL routability (was: ticket #972)
From: Adrian Knoth (adi_at_[hidden])
Date: 2008-07-29 16:40:09


On Tue, Jul 29, 2008 at 03:25:00PM -0400, Jeff Squyres wrote:

> For reference, the FAQ entry is here:
>
> http://www.open-mpi.org/faq/?category=tcp#tcp-routability
>
> It looks like we now *always* assume that two TCP peers are routable.

As long as they share the same address family (IPv4 or IPv6).

> The code in question is in btl_tcp_proc.c with the loop starting at
> line 413.

Yes. The FAQ is outdated, the new code is very different.

We now use graph theory, imagine a bipartite graph where each interface
is a vertex. (one peer on the left, the other on the right, no
connections inside each peer, only from left to right, hence a bipartite
graph).

Every edge in this graph is given a weight depending on its quality. The
quality is "defined" in btl_tcp_proc.h:

enum mca_btl_tcp_connection_quality {
        CQ_NO_CONNECTION,
        CQ_PRIVATE_DIFFERENT_NETWORK,
        CQ_PRIVATE_SAME_NETWORK,
        CQ_PUBLIC_DIFFERENT_NETWORK,
        CQ_PUBLIC_SAME_NETWORK
};

CQ_NO_CONNECTION (weight 0) is for different address families, so we
don't try to connect from IPv6 to IPv4 and vice versa. The more likely a
connection is going to be established, the higher the weight. So public
addresses on the same network (read: very close, probably sharing the
same link) are the best one can get, private addresses on different
networks have the lowest probability for a succeeding connection.

We then try to find a matching in the graph, this means, no two edges
may have a common endpoint on either side, thus avoiding
oversubscription.

In order to support striping, we look for the largest matching (read:
selecting as many edges (links) as possible).

In order to ensure connectivity, we then choose from the variety of
largest matchings the one with the highest summed weights. These edges
denote the addresses with the best probability for a succeeding
connection.

In terms of graph theory, this is called a maximum cardinality maximum
weight matching.

You can find the whole background story in Chapter 3:

   http://cluster.inf-ra.uni-jena.de/~adi/peiselt-thesis.pdf

We have also a brief IEEE paper on this:

   http://www.ieeexplore.ieee.org/xpl/freeabs_all.jsp?isnumber=4476518&arnumber=4476565&count=56&index=46

In other words: #972 is somewhat obsolete, the FAQ entry should surely
be removed/updated. I don't know to which extent, but if you want me to
write some lines, I could probably invent a not so scientific
description.

HTH

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
private: http://adi.thur.de