Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Guaranteed run rank 0 on a given machine?
From: Terry Dontje (terry.dontje_at_[hidden])
Date: 2010-12-10 14:08:04

On 12/10/2010 01:46 PM, David Mathog wrote:
> The master is commonly very different from the workers, so I expected
> there would be something like
> --rank0-on<hostname>
> but there doesn't seem to be a single switch on mpirun to do that.
> If "mastermachine" is the first entry in the hostfile, or the first
> machine in a -hosts list, will rank 0 always run there? If so, will it
> always run in the first slot on the first machine listed? That seems to
> be the case in practice, but is it guaranteed? Even if -loadbalance is
> used?
For Open MPI the above is correct, I am hesitant to use guaranteed though.
> Otherwise, there is the rankfile method. In the situation where the
> master must run on a specific node, but there is no preference for the
> workers, would a rank file like this be sufficient?
> rank 0=mastermachine slot=0
I thought you may have had to give all ranks but empirically it looks
like you can.
> The mpirun man page gives an example where all nodes/slots are
> specified, but it doesn't say explicitly what happens if the
> configuration is only partially specified, or how it interacts with the
> -np parameter. Modifying the man page example:
> cat myrankfile
> rank 0=aa slot=1:0-2
> rank 1=bb slot=0:0,1
> rank 2=cc slot=1-2
> mpirun -H aa,bb,cc,dd -np 4 -rf myrankfile ./a.out
> Rank 0 runs on node aa, bound to socket 1, cores 0-2.
> Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
> Rank 2 runs on node cc, bound to cores 1 and 2.
> Rank 3 runs where? not at all, or on dd, aa:slot=0, or ...?
 From my empirical runs it looks to me like rank 3 would end up on aa
possibly slot=0.
In other words once you run out of entries in the rankfile it looks like
the processes then start from the beginning of the hostlist again.

> Also, in my limited testing --host and -hostfile seem to be mutually
> exclusive. That is reasonable, but it isn't clear that it is intended.
> Example, with a hostfile containing one entry for "monkey02.cluster
> slots=1":
> mpirun --host monkey01 --mca plm_rsh_agent rsh hostname
> monkey01.cluster
> mpirun --host monkey02 --mca plm_rsh_agent rsh hostname
> monkey02.cluster
> mpirun -hostfile /usr/common/etc/openmpi.machines.test1 \
> --mca plm_rsh_agent rsh hostname
> monkey02.cluster
> mpirun --host monkey01 \
> -hostfile /usr/commom/etc/openmpi.machines.test1 \
> --mca plm_rsh_agent rsh hostname
> --------------------------------------------------------------------------
> There are no allocated resources for the application
> hostname
> that match the requested mapping:
> Verify that you have mapped the allocated resources properly using the
> --host or --hostfile specification.
> --------------------------------------------------------------------------
> Thanks,
> David Mathog
> mathog_at_[hidden]
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> users mailing list
> users_at_[hidden]

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>