Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Michael Kluskens (mkluskens_at_[hidden])
Date: 2006-04-25 14:57:35


I'm running OpenMPI 1.1 (v9704)and when a spawned processes exits the
parent does not die (see previous discussions about 1.0.1/1.0.2);
however, the next time the parent tries to spawn a process
MPI_Comm_spawn does not return.

My test output below:

  parent: 0 of 1
parent: How many processes total?
2
parent: Calling MPI_Comm_spawn to start 1 subprocesses.
child starting
parent returned from Comm_Spawn call
parent: Calling MPI_BCAST with btest = 17 . child = 3
child 0 of 1: Parent 3
parent: Calling MPI_Comm_spawn to start 1 subprocesses.
child 0 of 1: Receiving 17 from parent
child calling COMM_FREE
child calling FINALIZE
child exiting

Notice there is no message saying "parent returned from Comm_Spawn"
and the parent just sits there and obviously the second set of
processes don't get launched.

Quick note on code fixes, my child process now calls MPI_COMM_FREE
(parent,ierr) to free the communicator to the parent before exiting,
in earlier version of 1.1 this crashed the code. I'm guessing this
is the right thing to do, the Complete Reference book has an example
without it and the Using MPI-2 book has a more detailed example with
this in. In either case, I get the same results regardless.

Background from previous discussion on this follows. It will cost me
less to test new versions of Open MPI handling this than work around
this issue in my project.

Michael

On Mar 2, 2006, at 1:55 PM, Ralph Castain wrote:

> We expect to have much better support for the entire comm_spawn
> process in the next incarnation of the RTE. I don't expect that to
> be included in a release, however, until 1.1 (Jeff may be able to
> give you an estimate for when that will happen).
>
> Jeff et al may be able to give you access to an early non-release
> version sooner, if better comm_spawn support is a critical issue
> and you don't mind being patient with the inevitable bugs in such
> versions.
>
> Ralph
>
>
> Edgar Gabriel wrote:
>> Open MPI currently does not fully support a proper disconnection
>> of parent and child processes. Thus, if a child dies/aborts, the
>> parents will abort as well, despite of calling
>> MPI_Comm_disconnect. (The new RTE will have better support for
>> these operations, Ralph/Jeff can probably give a better estimate
>> when this will be available.) However, what should not happen is,
>> that if the child calls MPI_Finalize (so not a violent death but a
>> proper shutdown), the parent goes down at the same time. Let me
>> check that as well... Brignone, Sergio wrote:
>>> Hi everybody, I am trying to run a master/slave set. Because of
>>> the nature of the problem I need to start and stop (kill) some
>>> slaves. The problem is that as soon as one of the slave dies, the
>>> master dies also.