Good day,

I'm studying the impact of MPI process binding on communication costs in my project, and would like to use hwloc-bind to achieve fine-grained mapping control. I install hwloc 1.1.1 on a 2-socket 4-core machine (with 2 dual-core dies in each socket), and run hwloc-ps to verify the binding:

$ mpirun -V
mpirun (Open MPI) 1.5.1
$ mpirun -np 4 hwloc-bind socket:0.core:0-3 ./test

hwloc-ps shows the following output:

$ hwloc-ps -p
1497    Socket:0                ./test
1498    Socket:0                ./test
1499    Socket:0                ./test
1500    Socket:0                ./test
$ hwloc-ps -l
1497    Socket:0                ./test
1498    Socket:0                ./test
1499    Socket:0                ./test
1500    Socket:0                ./test
$ hwloc-ps -c
1497    0x00000055              ./test
1498    0x00000055              ./test
1499    0x00000055              ./test
1500    0x00000055              ./test


Questions: 
1. Does hwloc-bind map the processes *sequentially* on *successive* cores of the socket?
2. How could hwloc-ps help verify this binding, i.e.,

process 1497 (rank 0) on socket.0:core.0
process 1498 (rank 1) on socket.0:core.1
process 1499 (rank 2) on socket.0:core.2
process 1500 (rank 3) on socket.0:core.3


Equivalently, does the binding of `socket:0.core:0-1 socket:1.core:0-1' with hwloc-ps showing

$ hwloc-ps -l
1315    L2Cache:0 L2Cache:2             ./test
1316    L2Cache:0 L2Cache:2             ./test
1317    L2Cache:0 L2Cache:2             ./test
1318    L2Cache:0 L2Cache:2             ./test

indicate the the following? I.e.,

process 1315 (rank 0) on socket.0:core.0
process 1316 (rank 1) on socket.0:core.1
process 1317 (rank 2) on socket.1:core.0
process 1318 (rank 3) on socket.1:core.1


The topology of the machine is as follows:

$ hwloc-info -l
depth 0:        1 Machine (type #1)
 depth 1:       2 Sockets (type #3)
  depth 2:      4 Caches (type #4)
   depth 3:     8 Caches (type #4)
    depth 4:    8 Cores (type #5)
     depth 5:   8 PUs (type #6)

$ lstopo
Machine (16GB)
  Socket L#0
    L2 L#0 (4096KB)
      L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L1 L#1 (32KB) + Core L#1 + PU L#1 (P#2)
    L2 L#1 (4096KB)
      L1 L#2 (32KB) + Core L#2 + PU L#2 (P#4)
      L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6)
  Socket L#1
    L2 L#2 (4096KB)
      L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1)
      L1 L#5 (32KB) + Core L#5 + PU L#5 (P#3)
    L2 L#3 (4096KB)
      L1 L#6 (32KB) + Core L#6 + PU L#6 (P#5)
      L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)


Thanks.
Chan


It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.