Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Bogdan Costescu (Bogdan.Costescu_at_[hidden])
Date: 2006-04-03 11:34:56


On Mon, 3 Apr 2006, Christian Kauhaus wrote:

> This would result in an enormous amount of duplicated code, since
> the IPv4->IPv6 transition would only affect a small fraction of the
> total tcp BTL codebase. This is clearly a violation of the DRY
> principle (don't repeat yourself).

IMHO code can simply be shared and only the really different part
should be made independent. This is more a question of whether the
build system would allow such a scheme and of the runtime behaviour
(for static linking only one copy of the common part should be linked,
for dynamic loading maybe some module dependency to load the common
code only once could achieve the same result).

> If rsh/ssh cannot handle or authenticate IPv6 connections, the admin
> must keep the IPv6 addresses out of the resolver, so that
> getaddrbyhost() never returns an IPv6 address. That's it.

I beg to disagree. In a setup like the one mentioned, after orted is
started via an IPv4-only rsh/ssh, OpenMPI applications could use IPv6
without problems, just like they could use f.e. GM if Myrinet cards
would be present. I see this very much like your past experience with
the non-IPv6 rsh - it worked for you because the rsh client
automatically tried the IPv4 address after the IPv6 failure, but the
ssh client might not (be able to) do this. Please note that this is
also a matter of homogeneity of the cluster that you can't know in
advance, before starting the daemons - each host (including the one
where the rsh/ssh clients are run) can have its own level of IPv6
awareness.

On a side note, I think that the discussion can also be extended to
the batch/queueing systems that might be used to start the OpenMPI job
and would pass a list of machines to OpenMPI. If the machines are
given as IPs (either v4 or v6), OpenMPI should probably assume that
the address as given can be passed further to the underlying mechanism
for starting the job (for example, for SGE this would be its own rsh
client, not the system rsh client); but how about machines given as
names ?

-- 
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu_at_[hidden]