On 12/10/2010 01:46 PM, David Mathog wrote:
The master is commonly very different from the workers, so I expected
there would be something like

  --rank0-on <hostname>

but there doesn't seem to be a single switch on mpirun to do that.

If "mastermachine" is the first entry in the hostfile, or the first
machine in a -hosts list, will rank 0 always run there?  If so, will it
always run in the first slot on the first machine listed?  That seems to
be the case in practice, but is it guaranteed?  Even if -loadbalance is
used?  

For Open MPI the above is correct, I am hesitant to use guaranteed though.
Otherwise, there is the rankfile method.  In the situation where the
master must run on a specific node, but there is no preference for the
workers, would a rank file like this be sufficient?

rank 0=mastermachine slot=0
I thought you may have had to give all ranks but empirically it looks like you can.
The mpirun man page gives an example where all nodes/slots are
specified, but it doesn't say explicitly what happens if the
configuration is only partially specified, or how it interacts with the
-np parameter.  Modifying the man page example:

cat myrankfile
rank 0=aa slot=1:0-2
rank 1=bb slot=0:0,1
rank 2=cc slot=1-2
mpirun -H aa,bb,cc,dd -np 4 -rf myrankfile ./a.out

Rank 0 runs on node aa, bound to socket 1, cores 0-2.
Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
Rank 2 runs on node cc, bound to cores 1 and 2.

Rank 3 runs where?  not at all, or on dd, aa:slot=0, or ...? 
From my empirical runs it looks to me like rank 3 would end up on aa possibly slot=0.
In other words once you run out of entries in the rankfile it looks like the processes then start from the beginning of the hostlist again. 

--td
Also, in my limited testing --host and -hostfile seem to be mutually
exclusive.  That is reasonable, but it isn't clear that it is intended.
 Example, with a hostfile containing one entry for "monkey02.cluster
slots=1":

mpirun  --host monkey01   --mca plm_rsh_agent rsh  hostname
monkey01.cluster
mpirun  --host monkey02   --mca plm_rsh_agent rsh  hostname
monkey02.cluster
mpirun  -hostfile /usr/common/etc/openmpi.machines.test1 \
   --mca plm_rsh_agent rsh  hostname
monkey02.cluster
mpirun  --host monkey01  \
  -hostfile /usr/commom/etc/openmpi.machines.test1 \
  --mca plm_rsh_agent rsh  hostname
--------------------------------------------------------------------------
There are no allocated resources for the application 
  hostname
that match the requested mapping:
  

Verify that you have mapped the allocated resources properly using the 
--host or --hostfile specification.
--------------------------------------------------------------------------




Thanks,

David Mathog
mathog@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.dontje@oracle.com