Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] trunk's mapping to nodes... local host
From: Eugene Loh (eugene.loh_at_[hidden])
Date: 2012-09-07 12:20:07


Maybe this is related to Reuti's "-hostfile ignored in 1.6.1" on the
users mail list, but not quite sure.

Let's pretend my nodes are called local, r1, and r2. That is, I launch
mpirun from "local" and there are two other (remote) nodes available to
me. With the trunk (e.g., v1.9 r27247), I get

     % mpirun --bynode --nooversubscribe --host r1,r1,r1,r2,r2,r2 -n 6
--tag-output hostname
     [1,0]<stdout>:r1
     [1,1]<stdout>:r2
     [1,2]<stdout>:r1
     [1,3]<stdout>:r2
     [1,4]<stdout>:r1
     [1,5]<stdout>:r2

which seems right to me. But when the local node is involved:

     % mpirun --bynode --nooversubscribe --host
local,local,local,r1,r1,r1 -np 4 --tag-output hostname
     [1,0]<stdout>:local
     [1,1]<stdout>:r1
     [1,2]<stdout>:r1
     [1,3]<stdout>:r1
     % mpirun --bynode --nooversubscribe --host
local,local,local,r1,r1,r1 -np 5 --tag-output hostname
     
--------------------------------------------------------------------------
     There are not enough slots available in the system to satisfy the 5
slots
     that were requested by the application:
       hostname

     Either request fewer slots for your application, or make more slots
available
     for use.
     
--------------------------------------------------------------------------

I'm not seeing all the local slots I should be seeing. We're seeing
wide-scale MTT trunk failures due to this problem.

There is a similar loss of local slots with hostfile syntax. E.g.,

     % hostname
     local
     % cat hostfile
     local
     r1
     % mpirun --hostfile hostfile -n 2 hostname
     
--------------------------------------------------------------------------
     A hostfile was provided that contains at least one node not
     present in the allocation:

       hostfile: hostfile
       node: local

     If you are operating in a resource-managed environment, then only
     nodes that are in the allocation can be used in the hostfile. You
     may find relative node syntax to be a useful alternative to
     specifying absolute node names see the orte_hosts man page for
     further information.

     
--------------------------------------------------------------------------

The problem is solved with "--mca orte_default_hostname hostfile".