Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jean Latour (latour_at_[hidden])
Date: 2006-03-03 03:26:35


Thanks for your answer. Your example address one possible situation
where a parallel
application is spawned by a driver with MPI_Comm_Spawn, or multiple
parallel applications
are spawned at the same time with a MPI_Comm_Span_Multiple, over a set
of processors
described in the machinefile. It is OK if the next spawn occurs after
some processes at the
beginning of the machinefile have stopped.
However I have in hands another case where the spawn processes are
really dynamic over
time. Any child processes can stop (not necessarily the first in the
machinefile), and thus
they are freeing some processors on which the new spawned processes must
be running.
With LAM_MPI this situation has a satisfactory solution with the INFO
parameter of the
MPI_Comm_Spawn. It allows to specify a "local" machinefile for these
spawned processes,
instead of taking always the same machinefile from the beginning as in
your example.

Do you know if this specific feature will be implemented in Open-MPI (I
hope it will be),
and possibly when ?
Dynamic applications really need this.

Best Regards,
Jean Latour

Edgar Gabriel wrote:

>so for my tests, Open MPI did follow the machinefile (see output)
>further below, however, for each spawn operation it starts from the very
>beginning of the machinefile...
>
>The following example spawns 5 child processes (with a single
>MPI_Comm_spawn), and each child prints its rank and the hostname.
>
>gabriel_at_linux12 ~/dyncomm $ mpirun -hostfile machinefile -np 3
>./dyncomm_spawn_father
> Checking for MPI_Comm_spawn.....................working
>Hello world from child 0 on host linux12
>Hello world from child 1 on host linux13
>Hello world from child 3 on host linux15
>Hello world from child 4 on host linux16
> Testing Send/Recv on the intercomm..........working
>Hello world from child 2 on host linux14
>
>
>with the machinefile being:
>gabriel_at_linux12 ~/dyncomm $ cat machinefile
>linux12
>linux13
>linux14
>linux15
>linux16
>
>In your code, you always spawn 1 process at the time, and that's why
>they are all located on the same node.
>
>Hope this helps...
>Edgar
>
>
>Edgar Gabriel wrote:
>
>
>
>>as far as I know, Open MPI should follow the machinefile for spawn
>>operations, starting however for every spawn at the beginning of the
>>machinefile again. An info object such as 'lam_sched_round_robin' is
>>currently not available/implemented. Let me look into this...
>>
>>Jean Latour wrote:
>>
>>
>>
>>
>>>Hello,
>>>
>>>Testing the MPI_Comm_Spawn function of Open MPI version 1.0.1, I have an
>>>example that works OK,
>>>except that it shows that the spawned processes do not follow the
>>>"machinefile" setting of processors.
>>>In this example a master process spawns first 2 processes, then
>>>disconnects from them and spawn 2 more
>>>processes. Running on a Quad Opteron node, all processes are running on
>>>the same node, although the
>>>machinefile specifies that the slaves should run on different nodes.
>>>
>>>With the actual version of OpenMPI is it possible to direct the spawned
>>>processes on
>>>a specific node ? (the node distribution could be given in the
>>>"machinefile" file, as with LAM MPI)
>>>
>>>The code (Fortran 90) of this example and makefile is attached as a tar
>>>file.
>>>
>>>Thank you very much
>>>
>>>Jean Latour
>>>
>>>
>>>_______________________________________________
>>>users mailing list
>>>users_at_[hidden]
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>_______________________________________________
>>users mailing list
>>users_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
>
>