Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] trunk's mapping to nodes... local host
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-09-07 12:53:07


Looks to be working fine with current trunk r27267:

Ralphs-iMac:ompi-mrplus rhc$ mpirun -n 2 --nooversubscribe -host localhost,localhost,localhost --display-allocation hostname

====================== ALLOCATED NODES ======================

 Data for node: Ralphs-iMac.local Num slots: 3 Max slots: 0

=================================================================
Ralphs-iMac.local
Ralphs-iMac.local

On Sep 7, 2012, at 9:35 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> I can take a look at it - but it might be worth checking the trunk now as several related changes were committed over the last two days
>
> On Sep 7, 2012, at 9:20 AM, Eugene Loh <eugene.loh_at_[hidden]> wrote:
>
>> Maybe this is related to Reuti's "-hostfile ignored in 1.6.1" on the users mail list, but not quite sure.
>>
>> Let's pretend my nodes are called local, r1, and r2. That is, I launch mpirun from "local" and there are two other (remote) nodes available to me. With the trunk (e.g., v1.9 r27247), I get
>>
>> % mpirun --bynode --nooversubscribe --host r1,r1,r1,r2,r2,r2 -n 6 --tag-output hostname
>> [1,0]<stdout>:r1
>> [1,1]<stdout>:r2
>> [1,2]<stdout>:r1
>> [1,3]<stdout>:r2
>> [1,4]<stdout>:r1
>> [1,5]<stdout>:r2
>>
>> which seems right to me. But when the local node is involved:
>>
>> % mpirun --bynode --nooversubscribe --host local,local,local,r1,r1,r1 -np 4 --tag-output hostname
>> [1,0]<stdout>:local
>> [1,1]<stdout>:r1
>> [1,2]<stdout>:r1
>> [1,3]<stdout>:r1
>> % mpirun --bynode --nooversubscribe --host local,local,local,r1,r1,r1 -np 5 --tag-output hostname
>> --------------------------------------------------------------------------
>> There are not enough slots available in the system to satisfy the 5 slots
>> that were requested by the application:
>> hostname
>>
>> Either request fewer slots for your application, or make more slots available
>> for use.
>> --------------------------------------------------------------------------
>>
>> I'm not seeing all the local slots I should be seeing. We're seeing wide-scale MTT trunk failures due to this problem.
>>
>> There is a similar loss of local slots with hostfile syntax. E.g.,
>>
>> % hostname
>> local
>> % cat hostfile
>> local
>> r1
>> % mpirun --hostfile hostfile -n 2 hostname
>> --------------------------------------------------------------------------
>> A hostfile was provided that contains at least one node not
>> present in the allocation:
>>
>> hostfile: hostfile
>> node: local
>>
>> If you are operating in a resource-managed environment, then only
>> nodes that are in the allocation can be used in the hostfile. You
>> may find relative node syntax to be a useful alternative to
>> specifying absolute node names see the orte_hosts man page for
>> further information.
>>
>> --------------------------------------------------------------------------
>>
>> The problem is solved with "--mca orte_default_hostname hostfile".
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>