Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Spawn and OpenFabrics
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-06-05 10:14:45

On Jun 2, 2009, at 3:26 PM, Allen Barnett wrote:

> > Does OMPI say that it has IBV fork support?
> > ompi_info --param btl openib --parsable | grep
> have_fork_support
> My RHEL4 system reports:
> MCA btl: parameter "btl_openib_want_fork_support" (current value:
> "-1")
> MCA btl: information "btl_openib_have_fork_support" (value: "1")
> as does the build installed on the Altix system.

Ok, good. Note, however, that OMPI indicating that it has support
simply means that the verbs installed has support for it. It does
*not* mean that the underlying kernel supports it.

> > Be sure to also see
> We're using OMPI 1.2.8.


> > > Also, would MPI_COMM_SPAWN suffer from the same difficulties?
> >
> > It shouldn't; we proxy the launch of new commands off to mpirun /
> > OMPI's run-time system. Specifically: the new process(es) are not
> > POSIX children of the process(es) that called MPI_COMM_SPAWN.
> Is a program started with MPI_COMM_SPAWN required to call MPI_INIT?

Yes. OMPI v1.3 has an extension (a specific MPI_Info key) to indicate
that the spawned program is not an MPI application, but I do not
believe that that existed back in the 1.2 series.

> I
> guess what I'm asking is if I will have to make my partitioner an
> OpenMPI program as well?

If you use MPI_COMM_SPAWN with the 1.2 series, yes.

Another less attractive but functional solution would be to do what I
did for the new command notifier due in the OMPI v1.5 series
("notifier" = subsystem to notify external agents when OMPI detects
something wrong, like write to the syslog, send an email, write to a
sysadmin mysql db, etc., "command" = plugin that simply forks and runs
whatever command you want). During MPI_INIT, the fork notifier pre-
forks a dummy process. This dummy process then waits for commands via
a pipe. When the parent (MPI process itself) wants to fork a child,
it sends the argv to exec down the pipe and has the child process
actually do the fork and exec.

Proxying all the fork requests through a secondary process like this
avoids all the problems with registered memory in the child process.
This is icky, but it is an unfortunately necessity for OS-bypass/
registration-based networks like OpenFabrics.

In your case, you'd want to pre-fork before calling MPI_INIT. But the
rest of the technique is pretty much the same.

Have a look at the code in this tree if it helps:

Jeff Squyres
Cisco Systems