Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] trunk's mapping to nodes... local host
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-09-07 12:35:24


I can take a look at it - but it might be worth checking the trunk now as several related changes were committed over the last two days

On Sep 7, 2012, at 9:20 AM, Eugene Loh <eugene.loh_at_[hidden]> wrote:

> Maybe this is related to Reuti's "-hostfile ignored in 1.6.1" on the users mail list, but not quite sure.
>
> Let's pretend my nodes are called local, r1, and r2. That is, I launch mpirun from "local" and there are two other (remote) nodes available to me. With the trunk (e.g., v1.9 r27247), I get
>
> % mpirun --bynode --nooversubscribe --host r1,r1,r1,r2,r2,r2 -n 6 --tag-output hostname
> [1,0]<stdout>:r1
> [1,1]<stdout>:r2
> [1,2]<stdout>:r1
> [1,3]<stdout>:r2
> [1,4]<stdout>:r1
> [1,5]<stdout>:r2
>
> which seems right to me. But when the local node is involved:
>
> % mpirun --bynode --nooversubscribe --host local,local,local,r1,r1,r1 -np 4 --tag-output hostname
> [1,0]<stdout>:local
> [1,1]<stdout>:r1
> [1,2]<stdout>:r1
> [1,3]<stdout>:r1
> % mpirun --bynode --nooversubscribe --host local,local,local,r1,r1,r1 -np 5 --tag-output hostname
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 5 slots
> that were requested by the application:
> hostname
>
> Either request fewer slots for your application, or make more slots available
> for use.
> --------------------------------------------------------------------------
>
> I'm not seeing all the local slots I should be seeing. We're seeing wide-scale MTT trunk failures due to this problem.
>
> There is a similar loss of local slots with hostfile syntax. E.g.,
>
> % hostname
> local
> % cat hostfile
> local
> r1
> % mpirun --hostfile hostfile -n 2 hostname
> --------------------------------------------------------------------------
> A hostfile was provided that contains at least one node not
> present in the allocation:
>
> hostfile: hostfile
> node: local
>
> If you are operating in a resource-managed environment, then only
> nodes that are in the allocation can be used in the hostfile. You
> may find relative node syntax to be a useful alternative to
> specifying absolute node names see the orte_hosts man page for
> further information.
>
> --------------------------------------------------------------------------
>
> The problem is solved with "--mca orte_default_hostname hostfile".
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel