Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph Castain (rhc_at_[hidden])
Date: 2006-09-01 09:01:25

On 9/1/06 6:17 AM, "Adrian Knoth" <adi_at_[hidden]> wrote:

> Hi,
> yesterday I felt impelled to create a new ORTE oob component: tcp6.
> I was able to either compile the library with IPv4 or IPv6 support,
> but not with both (so to say: two different ompi installations or
> at least two different DSO versions).
> As far as I can see, many functions use mca_oob_tcp_component.tcp_listen_sd.
> Unfortunately, as I am not allowed to use v4mapped addresses (not supported
> by the Windows IPv6 stack, disabled by default on *BSD), this socket
> is either AF_INET or AF_INET6, but not both (both means AF_INET6 *and*
> accepting v4mapped addresses).
> Do you agree to go on with two oob components, tcp and tcp6?
> There is a lot of duplicated code, but we might refactor this
> when everything else will be done.

Yes, I think that's the right approach - see bottom for more comments,

> On the other hand, this whole procedure might be totally useless:
> two nodes may exchange IPv4-URIs via IPv6 containing identical
> RFC1918 networks. One would prefer IPv4 due to less overhead,
> but with IPv6, these v4-addresses might be at different locations
> anywhere in the world.
> In other words: IPv6 must be tried first or mixing with IPv4
> cannot be reliable. In this case, a lot of code may be removed
> and we'll end up with either two installations/DSOs (a mentioned
> above) or with runtime detection of af_family (i.e. look for
> global IPv6 addresses and iff found, disable IPv4 completely)

I think this can be supported nicely in the framework system. All we have to
do is set the IPv6 component's priority higher than IPv4. We then can deal
with the "try IPv6 first" by traversing the component list in priority
order. As an example, see the RAS framework.

> What do you think - which way is best? Use cases?

The only use case I am really concerned about is that of a Head Node Process
(HNP) that needs to talk to both IPv6 and IPv4 systems. I admit this will be
unusual, but I would hate to pursue a design that inherently can't support
it. In this case, we need both OOB components active, and we need a routing
table that tells us which one to use to talk to various processes. I suspect
the routing table belongs in the RML framework. If you look at the PLS
framework, you'll see where we "front" the select function to give you the
ability to specify a preferred selection. We might have to do the same thing
with the OOB to allow the RML to say "send this buffer using this specific
OOB component", while still allowing it to say "send this buffer using the
*best* component".

I suspect that backend processes (i.e., non-HNP processes) really will only
use one or the other. Of course, someone might set up a bizarre cluster or
grid that has a mix of IPv6 and IPv4 systems, but I doubt it. So I'm not as
concerned there.


You know, we never did much of a communications layer design for OpenRTE.
What may really be required here is to take a step back and do just that -
define the relative roles of the RML and OOB a little more clearly, decide
what would drive us to add components to either framework, etc.

Does that sound like a good idea? Otherwise, I fear we will have another
major overhaul (like we are doing right now for the launch frameworks) in
our future.