On Tue, Mar 14, 2006 at 12:37:52PM -0600, Edgar Gabriel wrote:
> I think I know what goes wrong. Since they are in different 'universes',
> they will have exactly the same 'Open MPI name', and therefore the
> algorithm in intercomm_merge can not determine which process should be
> first and which is second.
> Practically, all jobs which are connected at a certain point in there
> lifetime have to be in the same MPI universe, such that all jobs will
> have different jobid's and therefore different names. To use the same
> universe, you have to start the orted daemon in the persistent mode, so
> the sequence should be:
> orted --seed --persistent --scope public
> mpirun -np x ./app1
> mpirun -np y ./app2
> In this case everything should work as expected, you could do the
> comm_join between app1 and app2 and the intercomm_merge should work as well.
> Hope this helps
This was fine on a single machine. What do you recommend for multiple
machines (e.g. app1 on node1 and app2 on node2)? How do i tell
multiple orted instances that they are part of the same universe?
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Labs, IL USA B29D F333 664A 4280 315B