Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Christian Kauhaus (ckauhaus_at_[hidden])
Date: 2006-03-31 11:45:47

Bogdan Costescu <Bogdan.Costescu_at_[hidden]>:
>- are all computers that should participate in a job configured
>similarly (only IPv6 or both IPv4 and IPv6) ? If not all are, then
>should some part of the computers communicate over one protocol and
>the rest over the other ? I think that this split coomunication would

This should be really possible. If we do the connection handling code
correctly, the Internet Protocol version should not matter. Many other
daemons are coded right this way. The basic algorithm is like this:

/* retrieve list of addresses bound to the given target host */
getaddrinfo(..., &addr_list);

for (addr_res in addr_list) {
  /* initialize socket of the correct address family */
  fd = socket(addr_res->ai_family, ...);

  if (try_to_connect(fd)) break;

So the resolver already does the complicated work for us, since it
returns all addresses associated to a given target (hostname or IP-addr
notation) in the order of decreasing preference.

>- a related point is whether the 2 protocols should really be regarded
>as 2 different communication channels. OpenMPI is able to use several
>communication channels between 2 processes/MPI ranks at the same time,
>so should the same physical interface be split between the 2 logical
>protocols for communication between the same pair of computers ?

This one is sort of complicated. According to OMPI, there are several
interfaces on a host, and each interface has access to some fraction of
the total bandwidth. Now we also have two different protocols on each

Possible scenarios:

- We add the IP version to the OMP interface name. So instead of eth0
  and eth1 we would get eth0 eth0.v6 eth1 eth1.v6. Using this approach
  one could quite easily state her preferences using the btl command
  line arguments. Of course, the latency/bandwidth code would need to be
  re-worked, since now all traffic on a IPv6 interface would take
  available bandwidth away from the corresponding IPv4 interface.

- We do not add the IP version to the interface name, but perform
  protocol selection automatically based on resolver results. In this
  case the modification to the interface selection algorithm would
  probably a minor one. Backdraw: we cannot control the IP version
  beyond the resolver configuration, which is probably out of reach from
  the user. Since IPv6 imposes a slightly higher protocol overhead,
  users might want to use IPv4 in the local network, but cannot do
  anything if the automatic selection does it wrong.

- We introduce another parameter, which allows an IP version selection
  both globally and on a per-interface basis. Something like:
  IPv4-only / prefer IPv4 / auto (resolver) / prefer IPv6 / IPv6-only

The third approach would possibly the cleanest one.

>of the computers. For example, if the remote computer has IPv6
>configured but the sshd is restricted to bind to IPv4, then a ssh
>connection to this computer using the IPv6 address (which would be
>specified in the hostfile) will fail, while OpenMPI processes [...]

In my experience, this is no problem. We currently have some IPv6 test
networks running and also one of our clusters does IPv6 on its internal
ethernet. Hosts which are generally not IPv6-ready get no IPv6 address
in the DNS / hosts file. This prevents any contact using IPv6, since
their address is not known. Hosts which have some IPv6 support get a
double entry in the DNS or hosts file. Since it is standard behaviour
for every IPv6 app to try all known addresses for the target host until
any one succeeds, we are also able to connect to a IPv6-enabled host
where the target daemon does not listen on a IPv6 interface. For
example, we ran several weeks without an IPv6-enabled rsh, which is used
to handle MPI job startup on the cluster, without any problems.

>IMHO, some discussion of them should occur before the actual coding...

I agree. So here we go :-)


Dipl.-Inf. Christian Kauhaus                               <><
Lehrstuhl fuer Rechnerarchitektur und -kommunikation 
Institut fuer Informatik * Ernst-Abbe-Platz 1-2 * D-07743 Jena
Tel: +49 3641 9 46376  *  Fax: +49 3641 9 46372   *  Raum 3217