Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jean Latour (latour_at_[hidden])
Date: 2006-03-03 03:50:01


Just to add an example that may help to this "disconnect" discussion :
Attached is the code of a test that does the following (and it works
perfectly with OpenMPI 1.0.1)

 1) master spawns slave1
 2) master spawns slave2
 3) exechange messages between master and slaves over intercommunicator
 4) slave1 disconnects from master and finalize
 5) slave2 disconnects from master and finalize
(the processors used by slave 1 and slave 2 can now be re-used by new
spawned processes)
 6) master spawns slave3, and then slave4
 7) slave3 and slave4 have NO direct communicator, but they can create
one through the Open-Port
mechanism and the MPI_Connect / MPI_Accept functions.
The port number is relayed through the master.
 8) slave3 and slave4 create this direct communicator and do some
pingpong over it
 9) slave3 and slave4 disconnect from each other on this direct communicator
10) slave3 and slave4 disconnect from master an finalize
11) master finalize

Hope it helps
Best regards,
Jean Latour

Ralph Castain wrote:

> We expect to have much better support for the entire comm_spawn
> process in the next incarnation of the RTE. I don't expect that to be
> included in a release, however, until 1.1 (Jeff may be able to give
> you an estimate for when that will happen).
>
> Jeff et al may be able to give you access to an early non-release
> version sooner, if better comm_spawn support is a critical issue and
> you don't mind being patient with the inevitable bugs in such versions.
>
> Ralph
>
>
> Edgar Gabriel wrote:
>
>>Open MPI currently does not fully support a proper disconnection of
>>parent and child processes. Thus, if a child dies/aborts, the parents
>>will abort as well, despite of calling MPI_Comm_disconnect. (The new RTE
>>will have better support for these operations, Ralph/Jeff can probably
>>give a better estimate when this will be available.)
>>
>>However, what should not happen is, that if the child calls MPI_Finalize
>>(so not a violent death but a proper shutdown), the parent goes down at
>>the same time. Let me check that as well...
>>
>>Brignone, Sergio wrote:
>>
>>
>>
>>>Hi everybody,
>>>
>>>
>>>
>>>I am trying to run a master/slave set.
>>>
>>>Because of the nature of the problem I need to start and stop (kill)
>>>some slaves.
>>>
>>>The problem is that as soon as one of the slave dies, the master dies also.
>>>
>>>
>>>
>>>This is what I am doing:
>>>
>>>
>>>
>>>MASTER:
>>>
>>>
>>>
>>>MPI_Init(...)
>>>
>>>
>>>
>>>MPI_Comm_spawn(slave1,...,nslave1,...,intercomm1);
>>>
>>>
>>>
>>>MPI_Barrier(intercomm1);
>>>
>>>
>>>
>>>MPI_Comm_disconnect(&intercomm1);
>>>
>>>
>>>
>>>MPI_Comm_spawn(slave2,...,nslave2,...,intercomm2);
>>>
>>>
>>>
>>>MPI_Barrier(intercomm2);
>>>
>>>
>>>
>>>MPI_Comm_disconnect(&intercomm2);
>>>
>>>
>>>
>>>MPI_Finalize();
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>SLAVE:
>>>
>>>
>>>
>>>MPI_Init(...)
>>>
>>>
>>>
>>>MPI_Comm_get_parent(&intercomm);
>>>
>>>
>>>
>>>(does something)
>>>
>>>
>>>
>>>MPI_Barrier(intercomm);
>>>
>>>
>>>
>>>MPI_Comm_disconnect(&intercomm);
>>>
>>>
>>>
>>> MPI_Finalize();
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>The issue is that as soon as the first set of slaves calls MPI_Finalize,
>>>the master dies also (it dies right after MPI_Comm_disconnect(&intercomm1) )
>>>
>>>
>>>
>>>
>>>
>>>What am I doing wrong?
>>>
>>>
>>>
>>>Thanks
>>>
>>>
>>>
>>>Sergio
>>>
>>>
>>>
>>>
>>>
>>>
>>>------------------------------------------------------------------------
>>>
>>>_______________________________________________
>>>users mailing list
>>>users_at_[hidden]
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>>
>>_______________________________________________
>>users mailing list
>>users_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>------------------------------------------------------------------------
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>