|
|
There has been a lot of discussion about IPv6 in Open MPI and OpenRTE
recently. My comments here relate solely to OpenRTE and are intended to
help provide some clarity to the discussion.
OpenRTE communications are done via the Runtime Messaging Library (RML)
API. The RML is really a strategy layer - it determines which transport
will be used for the given message, and handles routing where required
(e.g., between cells). In our component architecture, the RML is
implemented as a framework - only one RML component can be selected and
active in a process.
Sitting under the RML is one or more transport systems - these are
known as Out-Of-Band (OOB) components and reside in the oob framework.
Because the OpenRTE messaging system must work in a heterogeneous
environment, multiple OOB components can be selected and active at one
time. The RML is responsible for picking the correct OOB to use to
communicate to a specific process in the most efficient manner possible.
Message destinations are specified in terms of OpenRTE process names -
*not* IP addresses. Thus, a message is sent to a particular OpenRTE
process name - it is the shared responsibility of the RML and its
underlying OOB components to translate that into a network address. The
exact role of the RML versus the OOB in that translation process has
not yet been determined.
Communication contact information for each process is provided to a
process during startup in the form of URI's that contain the OpenRTE
process name, IP address, and socket. A process is first given the URI
for the head node process (HNP) of that cell. This is done so that the
process can obtain subsequent information from the registry such as
contact info for all other processes in the job, MPI-layer contact
information, etc. The URI for each process clearly indicates whether
IPv6 or IPv4 is to be used for contacting that process name. The system
allows for multiple URI's to be provided for the same process name -
selection of which one to use for a given message is done by the RML
based on (a) interface availability (e.g., if only IPv4 is available,
then that is the one used) and (b) network congestion. Hence, there is
no ambiguity over which transport to use.
In the case of IPv6 versus IPv4, the expectation was that there would
be two OOB components, one each for these two protocols. The OOB
components are selected based on local support - i.e., if the local
system supports IPv6, then that component would be selected and
available. Likewise, if the local system can support IPv4, that
component would be selected too.
I hope that helps clarify OpenRTE's operation. I truly believe that
including IPv6 and IPv4 components in the OOB will be fairly simple to
accomplish. Yes, there may be some duplicate code - if there is enough
duplication, we can move the duplicate code into the OOB's base and let
the two components share it. Otherwise, a little duplication isn't that
big a deal.
I'd be happy to answer further questions. I believe you will find that
the Open MPI transport layer operates in a very similar manner, though
I leave that to Tim and Galen to clarify.
Ralph
|
|
|