Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] MPI adopt-a-group: a question and status report
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-08-04 12:01:52


On Aug 3, 2008, at 1:35 PM, Mark Borgerding wrote:

> First of all, my simple question:
> In what files can I find the source code for "mca_oob.oob_send" and
> "mca_oob.oob_recv"? I'm having a hard time following the
> initialization code that populates the struct of callbacks.

We actually only have one "oob" component that uses TCP
communications. We have long thought of writing others (e.g., a
native OOB for OpenFabrics kinds of networks), but never really gotten
around to it. So those function pointers point to the various
functions in orte/mca/oob/tcp/oob_tcp.c. On the OMPI SVN trunk, the
module struct starts at line 136; it's those functions in particular.

> Next, the context of the question:
> I've been trying to find a way to make a plain old process start and
> then participate in an MPI Group spread across a cluster. Let me
> try to use the local dialect and express my goal in terms I am
> likely to misuse: I want to make a singleton MPI process spawn and
> establish an intercommunicator with another MPI world.
>
> Here's the list of things that have not worked:
>
> Using MPI_Comm_spawn -- I've been told this is working in the 1.3
> cvs snapshots, but not in any stable release.
> The symptom is that the call to MPI_Comm_spawn complains about not
> having a hostfile. For the full history, see ompi-users thread "How
> to specify hosts for MPI_Comm_spawn" for details.

If you could verify that they do work for you on OMPI SVN trunk
nightly tarballs, that would be most helpful.

> Forking the parent process *before* it enters any MPI calls ( to
> hopefully avoid environmental pitfalls Jeff Squyres warned of).
> Parent process calls MPI_Init to become the MPI singleton, then
> tries to establish an intercommunicator with the MPI group that is
> getting spawned at the same time.

Just FYI, a minor terminology correction: the MPI processes that are
spawned have a common MPI communicator. A communicator is an MPI
group + a unique communication context. For example, two different
communicators can share the same group, but will always have different
communication contexts. So what you send on communicator A will never
be received on communicator B, even if the source and destination
processes are the same. My point: although the phrase has no
definition specified by the MPI spec, we usually say "MPI job" to mean
a bunch of MPI processes that share a common MPI_COMM_WORLD. So it's
[usually] more natural to say "...the spawned MPI job..."

> Forked child processes overlays the process of mpirun via execlp
> to start a "normal" MPI group. I've tried two different methods for
> establishing the intercomm. Both methods hang indefinitely and use
> lots of cpu doing nothing.
> Fork Method 1: MPI_Open_port+ MPI_Comm_accept on one side,
> MPI_Comm_connect on the other.
> The two sides hang in the MPI_Comm_accept and MPI_Comm_connect. I
> did not pursue it deeper than that.

Weird.

> Fork Method 2: tcp socket establishment, followed by MPI_Comm_join
> on both sides.
> Both sides hang in the call to MPI_Comm_join. Upon further
> inspection and code-hacking, I've determined they can successfully
> trade names "0.0.0" and "0.1.0" and both sides then call
> ompi_comm_connect_accept. Inside omp_comm_connect_accept, both
> sides call orte_rml.send_buffer; one side finishes the call, while
> the other gets blocked inside oob_send.
> The side that did not get blocked moves on to call
> orte_rml.recv_buffer . It gets blocked inside oob_recv.

I think that Ralph can shed light on this one -- we may not have good
support for COMM_JOIN in the v1.2 series without a persistent
orted...? It's a global process naming issue, IIRC.

> OOB == Out of band sockets? If so, why?

OOB is OMPI's out-of-band mechanism. We use it for bootstrapping and
other information exchange between MPI processes (e.g., the
information exchange during MPI_INIT and MPI_FINALIZE). It's not a
public API, and we change it between releases. I wouldn't recommend
using it in general MPI applications; it does not exist in other MPI
implementations.

-- 
Jeff Squyres
Cisco Systems