On Tue, Mar 14, 2006 at 12:37:52PM -0600, Edgar Gabriel wrote:
I think I know what goes wrong. Since they are in different 'universes',
they will have exactly the same 'Open MPI name', and therefore the
algorithm in intercomm_merge can not determine which process should be
first and which is second.
Practically, all jobs which are connected at a certain point in there
lifetime have to be in the same MPI universe, such that all jobs will
have different jobid's and therefore different names. To use the same
universe, you have to start the orted daemon in the persistent mode, so
the sequence should be:
orted --seed --persistent --scope public
mpirun -np x ./app1
mpirun -np y ./app2
In this case everything should work as expected, you could do the
comm_join between app1 and app2 and the intercomm_merge should work as well.
Hope this helps
This was fine on a single machine. What do you recommend for multiple
machines (e.g. app1 on node1 and app2 on node2)? How do i tell
multiple orted instances that they are part of the same universe?
thanks
==rob