Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] I have still a problem with rankfiles in openmpi-1.6.4rc3
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2013-02-07 04:05:47


Hi

thank you very much for your patch. I have applied the patch to
openmpi-1.6.4rc4.

> > thank you very much for your answer. I have compiled your program
> > and get different behaviours for openmpi-1.6.4rc3 and openmpi-1.9.
>
> Yes, something else seems to be going on for 1.9.
>
> For 1.6, try the attached patch. It works for me, but my machines
> have flatter (less interesting) topology. It'd be great if you
> could try
>
> % mpirun --report-bindings --rankfile myrankfile ./a.out
>
> with that check program I sent and with the following rankfiles:
>
> rank 0=sunpc1 slot=0:0
> rank 0=sunpc1 slot=0:1
> rank 0=sunpc1 slot=0:0-1
> rank 0=sunpc1 slot=1:0
> rank 0=sunpc1 slot=1:1
> rank 0=sunpc1 slot=1:0-1
> rank 0=sunpc1 slot=0:0-1,1:0-1
>
> where each line represents a different rankfile.

sunpc1 rankfiles 109 ompi_info | grep "MPI:"
                Open MPI: 1.6.4rc4r28022
sunpc1 rankfiles 110 cc check.c
sunpc1 rankfiles 111 mpirun --report-bindings --rankfile rf_0 ./a.out
[sunpc1:18167] MCW rank 0 bound to socket 0[core 0]:
   [B .][. .] (slot list 0:0)
bind to 0
sunpc1 rankfiles 112 mpirun --report-bindings --rankfile rf_1 ./a.out
[sunpc1:18170] MCW rank 0 bound to socket 0[core 1]:
   [. B][. .] (slot list 0:1)
bind to 1
sunpc1 rankfiles 113 mpirun --report-bindings --rankfile rf_2 ./a.out
[sunpc1:18173] MCW rank 0 bound to socket 0[core 0-1]:
   [B B][. .] (slot list 0:0-1)
unbound
sunpc1 rankfiles 114 mpirun --report-bindings --rankfile rf_3 ./a.out
[sunpc1:18176] MCW rank 0 bound to socket 1[core 0]:
   [. .][B .] (slot list 1:0)
bind to 2
sunpc1 rankfiles 115 mpirun --report-bindings --rankfile rf_4 ./a.out
[sunpc1:18179] MCW rank 0 bound to socket 1[core 1]:
   [. .][. B] (slot list 1:1)
bind to 3
sunpc1 rankfiles 116 mpirun --report-bindings --rankfile rf_5 ./a.out
[sunpc1:18182] MCW rank 0 bound to socket 1[core 0-1]:
   [. .][B B] (slot list 1:0-1)
unbound
sunpc1 rankfiles 117 mpirun --report-bindings --rankfile rf_6 ./a.out
[sunpc1:18185] MCW rank 0 bound to socket 0[core 0-1]
   socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
unbound
sunpc1 rankfiles 118

I get the following output for an unpatched openmpi-1.9.

sunpc1 rankfiles 106 ompi_info | grep "MPI:"
                Open MPI: 1.9a1r28035
sunpc1 rankfiles 107 cc check.c
sunpc1 rankfiles 108 mpirun --report-bindings --rankfile rf_0 ./a.out
[sunpc1:18260] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
   [B/.][./.]
bind to 0
sunpc1 rankfiles 109 mpirun --report-bindings --rankfile rf_1 ./a.out
[sunpc1:18263] MCW rank 0 bound to socket 0[core 0[hwt 0]],
   socket 0[core 1[hwt 0]]: [B/B][./.]
unbound
sunpc1 rankfiles 110 mpirun --report-bindings --rankfile rf_2 ./a.out
[sunpc1:18266] MCW rank 0 bound to socket 0[core 0[hwt 0]],
   socket 0[core 1[hwt 0]]: [B/B][./.]
unbound
sunpc1 rankfiles 111 mpirun --report-bindings --rankfile rf_3 ./a.out
[sunpc1:18269] MCW rank 0 bound to socket 1[core 2[hwt 0]],
   socket 1[core 3[hwt 0]]: [./.][B/B]
unbound
sunpc1 rankfiles 112 mpirun --report-bindings --rankfile rf_4 ./a.out
[sunpc1:18272] MCW rank 0 bound to socket 1[core 3[hwt 0]]:
   [./.][./B]
bind to 3
sunpc1 rankfiles 113 mpirun --report-bindings --rankfile rf_5 ./a.out
[sunpc1:18275] MCW rank 0 bound to socket 1[core 2[hwt 0]],
   socket 1[core 3[hwt 0]]: [./.][B/B]
unbound
sunpc1 rankfiles 114 mpirun --report-bindings --rankfile rf_6 ./a.out
[sunpc1:18278] MCW rank 0 bound to socket 0[core 0[hwt 0]],
   socket 0[core 1[hwt 0]]: [B/B][./.]
unbound
sunpc1 rankfiles 115

Thank you very much for any further help.

Kind regards

Siegmar