Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Guaranteed run rank 0 on a given machine?
From: Terry Dontje (terry.dontje_at_[hidden])
Date: 2010-12-10 14:08:04

On 12/10/2010 01:46 PM, David Mathog wrote:
> The master is commonly very different from the workers, so I expected
> there would be something like
> --rank0-on<hostname>
> but there doesn't seem to be a single switch on mpirun to do that.
> If "mastermachine" is the first entry in the hostfile, or the first
> machine in a -hosts list, will rank 0 always run there? If so, will it
> always run in the first slot on the first machine listed? That seems to
> be the case in practice, but is it guaranteed? Even if -loadbalance is
> used?
For Open MPI the above is correct, I am hesitant to use guaranteed though.
> Otherwise, there is the rankfile method. In the situation where the
> master must run on a specific node, but there is no preference for the
> workers, would a rank file like this be sufficient?
> rank 0=mastermachine slot=0
I thought you may have had to give all ranks but empirically it looks
like you can.
> The mpirun man page gives an example where all nodes/slots are
> specified, but it doesn't say explicitly what happens if the
> configuration is only partially specified, or how it interacts with the
> -np parameter. Modifying the man page example:
> cat myrankfile
> rank 0=aa slot=1:0-2
> rank 1=bb slot=0:0,1
> rank 2=cc slot=1-2
> mpirun -H aa,bb,cc,dd -np 4 -rf myrankfile ./a.out
> Rank 0 runs on node aa, bound to socket 1, cores 0-2.
> Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
> Rank 2 runs on node cc, bound to cores 1 and 2.
> Rank 3 runs where? not at all, or on dd, aa:slot=0, or ...?
 From my empirical runs it looks to me like rank 3 would end up on aa
possibly slot=0.
In other words once you run out of entries in the rankfile it looks like
the processes then start from the beginning of the hostlist again.

> Also, in my limited testing --host and -hostfile seem to be mutually
> exclusive. That is reasonable, but it isn't clear that it is intended.
> Example, with a hostfile containing one entry for "monkey02.cluster
> slots=1":
> mpirun --host monkey01 --mca plm_rsh_agent rsh hostname
> monkey01.cluster
> mpirun --host monkey02 --mca plm_rsh_agent rsh hostname
> monkey02.cluster
> mpirun -hostfile /usr/common/etc/openmpi.machines.test1 \
> --mca plm_rsh_agent rsh hostname
> monkey02.cluster
> mpirun --host monkey01 \
> -hostfile /usr/commom/etc/openmpi.machines.test1 \
> --mca plm_rsh_agent rsh hostname
> --------------------------------------------------------------------------
> There are no allocated resources for the application
> hostname
> that match the requested mapping:
> Verify that you have mapped the allocated resources properly using the
> --host or --hostfile specification.
> --------------------------------------------------------------------------
> Thanks,
> David Mathog
> mathog_at_[hidden]
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> users mailing list
> users_at_[hidden]

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>