Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] trunk's mapping to nodes... local host
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-09-07 12:35:24

I can take a look at it - but it might be worth checking the trunk now as several related changes were committed over the last two days

On Sep 7, 2012, at 9:20 AM, Eugene Loh <eugene.loh_at_[hidden]> wrote:

> Maybe this is related to Reuti's "-hostfile ignored in 1.6.1" on the users mail list, but not quite sure.
> Let's pretend my nodes are called local, r1, and r2. That is, I launch mpirun from "local" and there are two other (remote) nodes available to me. With the trunk (e.g., v1.9 r27247), I get
> % mpirun --bynode --nooversubscribe --host r1,r1,r1,r2,r2,r2 -n 6 --tag-output hostname
> [1,0]<stdout>:r1
> [1,1]<stdout>:r2
> [1,2]<stdout>:r1
> [1,3]<stdout>:r2
> [1,4]<stdout>:r1
> [1,5]<stdout>:r2
> which seems right to me. But when the local node is involved:
> % mpirun --bynode --nooversubscribe --host local,local,local,r1,r1,r1 -np 4 --tag-output hostname
> [1,0]<stdout>:local
> [1,1]<stdout>:r1
> [1,2]<stdout>:r1
> [1,3]<stdout>:r1
> % mpirun --bynode --nooversubscribe --host local,local,local,r1,r1,r1 -np 5 --tag-output hostname
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 5 slots
> that were requested by the application:
> hostname
> Either request fewer slots for your application, or make more slots available
> for use.
> --------------------------------------------------------------------------
> I'm not seeing all the local slots I should be seeing. We're seeing wide-scale MTT trunk failures due to this problem.
> There is a similar loss of local slots with hostfile syntax. E.g.,
> % hostname
> local
> % cat hostfile
> local
> r1
> % mpirun --hostfile hostfile -n 2 hostname
> --------------------------------------------------------------------------
> A hostfile was provided that contains at least one node not
> present in the allocation:
> hostfile: hostfile
> node: local
> If you are operating in a resource-managed environment, then only
> nodes that are in the allocation can be used in the hostfile. You
> may find relative node syntax to be a useful alternative to
> specifying absolute node names see the orte_hosts man page for
> further information.
> --------------------------------------------------------------------------
> The problem is solved with "--mca orte_default_hostname hostfile".
> _______________________________________________
> devel mailing list
> devel_at_[hidden]