Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] more information for my problem with rankfiles
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2013-02-09 06:09:23


I could successfully use the following rankfile on Linux with
openmpi-1.6.4rc3r27923, but it doesn't work with a patched
openmpi-1.6.4rc4r28022 (patch.diff from Eugene). Perhaps this
information helps to track down the error.

tyr rankfiles 114 cat rf_ex_linpc
# mpiexec -report-bindings -rf rf_ex_linpc hostname
rank 0=linpc0 slot=0:0-1,1:0-1
rank 1=linpc1 slot=0:0-1
rank 2=linpc1 slot=1:0
rank 3=linpc1 slot=1:1

linpc1 rankfiles 99 mpiexec -report-bindings -rf rf_ex_linpc hostname
The rankfile that was used claimed that a host was either not
allocated or oversubscribed its slots. Please review your rank-slot
assignments and your host allocation to ensure a proper match. Also,
some systems may require using full hostnames, such as
"" (instead of just plain "host1").

  Host: linpc0

linpc1 rankfiles 100 ompi_info | grep "MPI:"
                Open MPI: 1.6.4rc4r28022
linpc1 rankfiles 101 exit

tyr rankfiles 110 ssh linpc1
linpc1 fd1026 96 cd .../prog/mpi/rankfiles/
linpc1 rankfiles 97 mpiexec -report-bindings -rf rf_ex_linpc hostname
[linpc1:21351] MCW rank 1 bound to socket 0[core 0-1]:
  [B B][. .] (slot list 0:0-1)
[linpc1:21351] MCW rank 2 bound to socket 1[core 0]:
  [. .][B .] (slot list 1:0)
[linpc1:21351] MCW rank 3 bound to socket 1[core 1]:
  [. .][. B] (slot list 1:1)
[linpc0:08012] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]:
  [B B][B B] (slot list 0:0-1,1:0-1)

linpc1 rankfiles 98 ompi_info | grep "MPI:"
                Open MPI: 1.6.4rc3r27923
linpc1 rankfiles 99

I will build an unpatched openmpi-1.6.4rc4 and check if the
above rankfile will work. Unfortunately I can check only tomorrow
because new packages will be mirrored in the night to all machines
so that it is not available on both machines today. I let you know
the result.

Kind regards