Hi
thank you very much for your patch. I have applied the patch to
openmpi-1.6.4rc4.
> > thank you very much for your answer. I have compiled your program
> > and get different behaviours for openmpi-1.6.4rc3 and openmpi-1.9.
>
> Yes, something else seems to be going on for 1.9.
>
> For 1.6, try the attached patch. It works for me, but my machines
> have flatter (less interesting) topology. It'd be great if you
> could try
>
> % mpirun --report-bindings --rankfile myrankfile ./a.out
>
> with that check program I sent and with the following rankfiles:
>
> rank 0=sunpc1 slot=0:0
> rank 0=sunpc1 slot=0:1
> rank 0=sunpc1 slot=0:0-1
> rank 0=sunpc1 slot=1:0
> rank 0=sunpc1 slot=1:1
> rank 0=sunpc1 slot=1:0-1
> rank 0=sunpc1 slot=0:0-1,1:0-1
>
> where each line represents a different rankfile.
sunpc1 rankfiles 109 ompi_info | grep "MPI:"
Open MPI: 1.6.4rc4r28022
sunpc1 rankfiles 110 cc check.c
sunpc1 rankfiles 111 mpirun --report-bindings --rankfile rf_0 ./a.out
[sunpc1:18167] MCW rank 0 bound to socket 0[core 0]:
[B .][. .] (slot list 0:0)
bind to 0
sunpc1 rankfiles 112 mpirun --report-bindings --rankfile rf_1 ./a.out
[sunpc1:18170] MCW rank 0 bound to socket 0[core 1]:
[. B][. .] (slot list 0:1)
bind to 1
sunpc1 rankfiles 113 mpirun --report-bindings --rankfile rf_2 ./a.out
[sunpc1:18173] MCW rank 0 bound to socket 0[core 0-1]:
[B B][. .] (slot list 0:0-1)
unbound
sunpc1 rankfiles 114 mpirun --report-bindings --rankfile rf_3 ./a.out
[sunpc1:18176] MCW rank 0 bound to socket 1[core 0]:
[. .][B .] (slot list 1:0)
bind to 2
sunpc1 rankfiles 115 mpirun --report-bindings --rankfile rf_4 ./a.out
[sunpc1:18179] MCW rank 0 bound to socket 1[core 1]:
[. .][. B] (slot list 1:1)
bind to 3
sunpc1 rankfiles 116 mpirun --report-bindings --rankfile rf_5 ./a.out
[sunpc1:18182] MCW rank 0 bound to socket 1[core 0-1]:
[. .][B B] (slot list 1:0-1)
unbound
sunpc1 rankfiles 117 mpirun --report-bindings --rankfile rf_6 ./a.out
[sunpc1:18185] MCW rank 0 bound to socket 0[core 0-1]
socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
unbound
sunpc1 rankfiles 118
I get the following output for an unpatched openmpi-1.9.
sunpc1 rankfiles 106 ompi_info | grep "MPI:"
Open MPI: 1.9a1r28035
sunpc1 rankfiles 107 cc check.c
sunpc1 rankfiles 108 mpirun --report-bindings --rankfile rf_0 ./a.out
[sunpc1:18260] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/.][./.]
bind to 0
sunpc1 rankfiles 109 mpirun --report-bindings --rankfile rf_1 ./a.out
[sunpc1:18263] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket 0[core 1[hwt 0]]: [B/B][./.]
unbound
sunpc1 rankfiles 110 mpirun --report-bindings --rankfile rf_2 ./a.out
[sunpc1:18266] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket 0[core 1[hwt 0]]: [B/B][./.]
unbound
sunpc1 rankfiles 111 mpirun --report-bindings --rankfile rf_3 ./a.out
[sunpc1:18269] MCW rank 0 bound to socket 1[core 2[hwt 0]],
socket 1[core 3[hwt 0]]: [./.][B/B]
unbound
sunpc1 rankfiles 112 mpirun --report-bindings --rankfile rf_4 ./a.out
[sunpc1:18272] MCW rank 0 bound to socket 1[core 3[hwt 0]]:
[./.][./B]
bind to 3
sunpc1 rankfiles 113 mpirun --report-bindings --rankfile rf_5 ./a.out
[sunpc1:18275] MCW rank 0 bound to socket 1[core 2[hwt 0]],
socket 1[core 3[hwt 0]]: [./.][B/B]
unbound
sunpc1 rankfiles 114 mpirun --report-bindings --rankfile rf_6 ./a.out
[sunpc1:18278] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket 0[core 1[hwt 0]]: [B/B][./.]
unbound
sunpc1 rankfiles 115
Thank you very much for any further help.
Kind regards
Siegmar
|