Hi
I could successfully use the following rankfile on Linux with
openmpi-1.6.4rc3r27923, but it doesn't work with a patched
openmpi-1.6.4rc4r28022 (patch.diff from Eugene). Perhaps this
information helps to track down the error.
tyr rankfiles 114 cat rf_ex_linpc
# mpiexec -report-bindings -rf rf_ex_linpc hostname
rank 0=linpc0 slot=0:0-1,1:0-1
rank 1=linpc1 slot=0:0-1
rank 2=linpc1 slot=1:0
rank 3=linpc1 slot=1:1
linpc1 rankfiles 99 mpiexec -report-bindings -rf rf_ex_linpc hostname
------------------------------------------------------------------------
The rankfile that was used claimed that a host was either not
allocated or oversubscribed its slots. Please review your rank-slot
assignments and your host allocation to ensure a proper match. Also,
some systems may require using full hostnames, such as
"host1.example.com" (instead of just plain "host1").
Host: linpc0
------------------------------------------------------------------------
linpc1 rankfiles 100 ompi_info | grep "MPI:"
Open MPI: 1.6.4rc4r28022
linpc1 rankfiles 101 exit
tyr rankfiles 110 ssh linpc1
linpc1 fd1026 96 cd .../prog/mpi/rankfiles/
linpc1 rankfiles 97 mpiexec -report-bindings -rf rf_ex_linpc hostname
[linpc1:21351] MCW rank 1 bound to socket 0[core 0-1]:
[B B][. .] (slot list 0:0-1)
[linpc1:21351] MCW rank 2 bound to socket 1[core 0]:
[. .][B .] (slot list 1:0)
[linpc1:21351] MCW rank 3 bound to socket 1[core 1]:
[. .][. B] (slot list 1:1)
[linpc0:08012] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]:
[B B][B B] (slot list 0:0-1,1:0-1)
linpc1 rankfiles 98 ompi_info | grep "MPI:"
Open MPI: 1.6.4rc3r27923
linpc1 rankfiles 99
I will build an unpatched openmpi-1.6.4rc4 and check if the
above rankfile will work. Unfortunately I can check only tomorrow
because new packages will be mirrored in the night to all machines
so that it is not available on both machines today. I let you know
the result.
Kind regards
Siegmar
|