Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Guaranteed run rank 0 on a given machine?
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-12-10 17:47:58


Terry is correct - not guaranteed, but that is the typical behavior.

However, you -can- guarantee that rank=0 will be on a particular host. Just run your job:

mpirun -n 1 -host <target> my_app : -n (N-1) my_app

This guarantees that rank=0 is on host <target>. All other ranks will be distributed according to the selected mapping algorithm, including loadbalance

Ralph

On Dec 10, 2010, at 12:08 PM, Terry Dontje wrote:

> On 12/10/2010 01:46 PM, David Mathog wrote:
>>
>> The master is commonly very different from the workers, so I expected
>> there would be something like
>>
>> --rank0-on <hostname>
>>
>> but there doesn't seem to be a single switch on mpirun to do that.
>>
>> If "mastermachine" is the first entry in the hostfile, or the first
>> machine in a -hosts list, will rank 0 always run there? If so, will it
>> always run in the first slot on the first machine listed? That seems to
>> be the case in practice, but is it guaranteed? Even if -loadbalance is
>> used?
>>
> For Open MPI the above is correct, I am hesitant to use guaranteed though.
>> Otherwise, there is the rankfile method. In the situation where the
>> master must run on a specific node, but there is no preference for the
>> workers, would a rank file like this be sufficient?
>>
>> rank 0=mastermachine slot=0
> I thought you may have had to give all ranks but empirically it looks like you can.
>> The mpirun man page gives an example where all nodes/slots are
>> specified, but it doesn't say explicitly what happens if the
>> configuration is only partially specified, or how it interacts with the
>> -np parameter. Modifying the man page example:
>>
>> cat myrankfile
>> rank 0=aa slot=1:0-2
>> rank 1=bb slot=0:0,1
>> rank 2=cc slot=1-2
>> mpirun -H aa,bb,cc,dd -np 4 -rf myrankfile ./a.out
>>
>> Rank 0 runs on node aa, bound to socket 1, cores 0-2.
>> Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
>> Rank 2 runs on node cc, bound to cores 1 and 2.
>>
>> Rank 3 runs where? not at all, or on dd, aa:slot=0, or ...?
> From my empirical runs it looks to me like rank 3 would end up on aa possibly slot=0.
> In other words once you run out of entries in the rankfile it looks like the processes then start from the beginning of the hostlist again.
>
> --td
>> Also, in my limited testing --host and -hostfile seem to be mutually
>> exclusive. That is reasonable, but it isn't clear that it is intended.
>> Example, with a hostfile containing one entry for "monkey02.cluster
>> slots=1":
>>
>> mpirun --host monkey01 --mca plm_rsh_agent rsh hostname
>> monkey01.cluster
>> mpirun --host monkey02 --mca plm_rsh_agent rsh hostname
>> monkey02.cluster
>> mpirun -hostfile /usr/common/etc/openmpi.machines.test1 \
>> --mca plm_rsh_agent rsh hostname
>> monkey02.cluster
>> mpirun --host monkey01 \
>> -hostfile /usr/commom/etc/openmpi.machines.test1 \
>> --mca plm_rsh_agent rsh hostname
>> --------------------------------------------------------------------------
>> There are no allocated resources for the application
>> hostname
>> that match the requested mapping:
>>
>>
>> Verify that you have mapped the allocated resources properly using the
>> --host or --hostfile specification.
>> --------------------------------------------------------------------------
>>
>>
>>
>>
>> Thanks,
>>
>> David Mathog
>> mathog_at_[hidden]
>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> <Mail Attachment.gif>
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden]
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users