Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI adopt-a-group: a question and status report
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-08-04 12:01:52

On Aug 3, 2008, at 1:35 PM, Mark Borgerding wrote:

> First of all, my simple question:
> In what files can I find the source code for "mca_oob.oob_send" and
> "mca_oob.oob_recv"? I'm having a hard time following the
> initialization code that populates the struct of callbacks.

We actually only have one "oob" component that uses TCP
communications. We have long thought of writing others (e.g., a
native OOB for OpenFabrics kinds of networks), but never really gotten
around to it. So those function pointers point to the various
functions in orte/mca/oob/tcp/oob_tcp.c. On the OMPI SVN trunk, the
module struct starts at line 136; it's those functions in particular.

> Next, the context of the question:
> I've been trying to find a way to make a plain old process start and
> then participate in an MPI Group spread across a cluster. Let me
> try to use the local dialect and express my goal in terms I am
> likely to misuse: I want to make a singleton MPI process spawn and
> establish an intercommunicator with another MPI world.
> Here's the list of things that have not worked:
> Using MPI_Comm_spawn -- I've been told this is working in the 1.3
> cvs snapshots, but not in any stable release.
> The symptom is that the call to MPI_Comm_spawn complains about not
> having a hostfile. For the full history, see ompi-users thread "How
> to specify hosts for MPI_Comm_spawn" for details.

If you could verify that they do work for you on OMPI SVN trunk
nightly tarballs, that would be most helpful.

> Forking the parent process *before* it enters any MPI calls ( to
> hopefully avoid environmental pitfalls Jeff Squyres warned of).
> Parent process calls MPI_Init to become the MPI singleton, then
> tries to establish an intercommunicator with the MPI group that is
> getting spawned at the same time.

Just FYI, a minor terminology correction: the MPI processes that are
spawned have a common MPI communicator. A communicator is an MPI
group + a unique communication context. For example, two different
communicators can share the same group, but will always have different
communication contexts. So what you send on communicator A will never
be received on communicator B, even if the source and destination
processes are the same. My point: although the phrase has no
definition specified by the MPI spec, we usually say "MPI job" to mean
a bunch of MPI processes that share a common MPI_COMM_WORLD. So it's
[usually] more natural to say "...the spawned MPI job..."

> Forked child processes overlays the process of mpirun via execlp
> to start a "normal" MPI group. I've tried two different methods for
> establishing the intercomm. Both methods hang indefinitely and use
> lots of cpu doing nothing.
> Fork Method 1: MPI_Open_port+ MPI_Comm_accept on one side,
> MPI_Comm_connect on the other.
> The two sides hang in the MPI_Comm_accept and MPI_Comm_connect. I
> did not pursue it deeper than that.


> Fork Method 2: tcp socket establishment, followed by MPI_Comm_join
> on both sides.
> Both sides hang in the call to MPI_Comm_join. Upon further
> inspection and code-hacking, I've determined they can successfully
> trade names "0.0.0" and "0.1.0" and both sides then call
> ompi_comm_connect_accept. Inside omp_comm_connect_accept, both
> sides call orte_rml.send_buffer; one side finishes the call, while
> the other gets blocked inside oob_send.
> The side that did not get blocked moves on to call
> orte_rml.recv_buffer . It gets blocked inside oob_recv.

I think that Ralph can shed light on this one -- we may not have good
support for COMM_JOIN in the v1.2 series without a persistent
orted...? It's a global process naming issue, IIRC.

> OOB == Out of band sockets? If so, why?

OOB is OMPI's out-of-band mechanism. We use it for bootstrapping and
other information exchange between MPI processes (e.g., the
information exchange during MPI_INIT and MPI_FINALIZE). It's not a
public API, and we change it between releases. I wouldn't recommend
using it in general MPI applications; it does not exist in other MPI

Jeff Squyres
Cisco Systems