Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] CPU binding
From: Panos Labropoulos (panos.labropoulos_at_[hidden])
Date: 2013-10-02 19:32:26


Hallo,

We seem to be unable to to set the cpu binding on a cluster consisting of
Dell M420/M610 systems:

[jallan_at_hpc21 ~]$ cat report-bindings.sh #!/bin/sh

bitmap=`hwloc-bind --get -p`
friendly=`hwloc-calc -p -H socket.core.pu $bitmap`

echo "MCW rank $OMPI_COMM_WORLD_RANK (`hostname`): $friendly"
exit 0

[jallan_at_hpc27 ~]$ hwloc-bind -v socket:0.core:0 -l ./report-bindings.sh
using object #0 depth 2 below cpuset 0x000000ff
using object #0 depth 6 below cpuset 0x00000080
adding 0x00000080 to 0x0
adding 0x00000080 to 0x0
assuming the command starts at ./report-bindings.sh
binding on cpu set 0x00000080
MCW rank (hpc27): Socket:0.Core:10.PU:7
[jallan_at_hpc27 ~]$ hwloc-bind -v socket:1.core:0 -l ./report-bindings.sh
object #1 depth 2 (type socket) below cpuset 0x000000ff does not exist
adding 0x0 to 0x0
assuming the command starts at ./report-bindings.sh
MCW rank (hpc27): Socket:0.Core:10.PU:7

The topology of this system looks a bit strange:

[jallan_at_hpc21 ~]$ lstopo --no-io
Machine (24GB)
 NUMANode L#0 (P#0 24GB)
 NUMANode L#1 (P#1) + Socket L#0 + L3 L#0 (15MB) + L2 L#0 (256KB) + L1
L#0 (32KB) + Core L#0 + PU L#0 (P#11)
[jallan_at_hpc21 ~]$

Using Open MPI 1.4.4:

http://pastebin.com/VsZS2q3R

For some reason the binding cannot be set. We also tried Open MPI 1.6.5 and
1.7.3 with similar results.

This is the output from a local SMP system:

[panos_at_demo ~]$ hwloc-bind -v socket:1.core:0 -l ./report-bindings.sh
using object #1 depth 2 below cpuset 0x00000003 using object #0 depth 6
below cpuset 0x00000002 adding 0x00000002 to 0x0 adding 0x00000002 to 0x0
assuming the command starts at ./report-bindings.sh binding on cpu set
0x00000002 MCW rank (demo): Socket:1.Core:0.PU:1 [panos_at_demo ~]$
hwloc-bind -v socket:0.core:0 -l ./report-bindings.sh using object #0
depth 2 below cpuset 0x00000003 using object #0 depth 6 below cpuset
0x00000001 adding 0x00000001 to 0x0 adding 0x00000001 to 0x0 assuming the
command starts at ./report-bindings.sh binding on cpu set 0x00000001 MCW
rank (demo): Socket:0.Core:0.PU:0

The MPI binding output is formatted a bit different as this nodes runs Open
MPI 1.6.5:

[panos_at_demo ~]$ `which mpiexec` --report-bindings --bind-to-core
--bycore -mca btl ^openib -np 4 -hostfile ./hplnodes2 -x
LD_LIBRARY_PATH -x PATH /cm/shared/apps/hpl/2.1/xhpl
[demo:25615] MCW rank 0 bound to socket 0[core 0]: [B][.] [demo:25615] MCW
rank 2 bound to socket 1[core 0]: [.][B] [node003:08454] MCW rank 1 bound
to socket 0[core 0]: [B .] [node003:08454] MCW rank 3 bound to socket
0[core 1]: [. B] [panos_at_demo ~]$ module load hwloc

[panos_at_demo ~]$ lstopo -l
Machine (4095MB)
 NUMANode L#0 (P#0 2048MB) + Socket L#0 + L2 L#0 (1024KB) + L1d L#0
(64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0)
 NUMANode L#1 (P#1 2048MB) + Socket L#1 + L2 L#1 (1024KB) + L1d L#1
(64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 (P#1)

Any help will be appreciated.

Kind Regards,
  Panos Labropoulos