Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Guaranteed run rank 0 on a given machine?
From: David Mathog (mathog_at_[hidden])
Date: 2010-12-10 13:46:25


The master is commonly very different from the workers, so I expected
there would be something like

  --rank0-on <hostname>

but there doesn't seem to be a single switch on mpirun to do that.

If "mastermachine" is the first entry in the hostfile, or the first
machine in a -hosts list, will rank 0 always run there? If so, will it
always run in the first slot on the first machine listed? That seems to
be the case in practice, but is it guaranteed? Even if -loadbalance is
used?

Otherwise, there is the rankfile method. In the situation where the
master must run on a specific node, but there is no preference for the
workers, would a rank file like this be sufficient?

rank 0=mastermachine slot=0

The mpirun man page gives an example where all nodes/slots are
specified, but it doesn't say explicitly what happens if the
configuration is only partially specified, or how it interacts with the
-np parameter. Modifying the man page example:

cat myrankfile
rank 0=aa slot=1:0-2
rank 1=bb slot=0:0,1
rank 2=cc slot=1-2
mpirun -H aa,bb,cc,dd -np 4 -rf myrankfile ./a.out

Rank 0 runs on node aa, bound to socket 1, cores 0-2.
Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
Rank 2 runs on node cc, bound to cores 1 and 2.

Rank 3 runs where? not at all, or on dd, aa:slot=0, or ...?

Also, in my limited testing --host and -hostfile seem to be mutually
exclusive. That is reasonable, but it isn't clear that it is intended.
 Example, with a hostfile containing one entry for "monkey02.cluster
slots=1":

mpirun --host monkey01 --mca plm_rsh_agent rsh hostname
monkey01.cluster
mpirun --host monkey02 --mca plm_rsh_agent rsh hostname
monkey02.cluster
mpirun -hostfile /usr/common/etc/openmpi.machines.test1 \
   --mca plm_rsh_agent rsh hostname
monkey02.cluster
mpirun --host monkey01 \
  -hostfile /usr/commom/etc/openmpi.machines.test1 \
  --mca plm_rsh_agent rsh hostname
--------------------------------------------------------------------------
There are no allocated resources for the application
  hostname
that match the requested mapping:
  

Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--------------------------------------------------------------------------

Thanks,

David Mathog
mathog_at_[hidden]
Manager, Sequence Analysis Facility, Biology Division, Caltech