| Good day, I'm studying the impact of MPI process binding on communication costs in my project, and would like to use hwloc-bind to achieve fine-grained mapping control. I install hwloc 1.1.1 on a 2-socket 4-core machine (with 2 dual-core dies in each socket), and run hwloc-ps to verify the binding: $ mpirun -V mpirun (Open MPI) 1.5.1 $ mpirun -np 4 hwloc-bind socket:0.core:0-3 ./test hwloc-ps shows the following output: $ hwloc-ps -p 1497 Socket:0 ./test 1498 Socket:0 ./test 1499 Socket:0 ./test 1500 Socket:0 ./test $ hwloc-ps -l 1497 Socket:0 ./test 1498 Socket:0 ./test 1499 Socket:0
./test 1500 Socket:0 ./test $ hwloc-ps -c 1497 0x00000055 ./test 1498 0x00000055 ./test 1499 0x00000055 ./test 1500 0x00000055 ./test Questions: 1. Does hwloc-bind map the processes *sequentially* on *successive*
cores of the socket? 2. How could hwloc-ps help verify this binding, i.e., process 1497 (rank 0) on socket.0:core.0 process 1498 (rank 1) on socket.0:core.1 process 1499 (rank 2) on socket.0:core.2 process 1500 (rank
3) on socket.0:core.3 Equivalently, does the binding of `socket:0.core:0-1
socket:1.core:0-1' with hwloc-ps showing $ hwloc-ps -l 1315 L2Cache:0 L2Cache:2 ./test 1316 L2Cache:0 L2Cache:2 ./test 1317 L2Cache:0 L2Cache:2 ./test 1318 L2Cache:0 L2Cache:2 ./test indicate the the following? I.e., process 1315 (rank 0) on socket.0:core.0 process 1316 (rank 1) on socket.0:core.1 process 1317 (rank 2) on socket.1:core.0 process 1318 (rank 3) on socket.1:core.1 The topology of the machine is as follows: $ hwloc-info -l depth 0: 1 Machine (type #1) depth 1: 2 Sockets (type #3) depth 2: 4 Caches (type #4) depth 3: 8 Caches (type #4) depth 4: 8 Cores (type #5) depth 5: 8 PUs (type #6) $ lstopo Machine (16GB) Socket L#0 L2 L#0 (4096KB) L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L1 L#1 (32KB) + Core L#1 + PU L#1 (P#2) L2 L#1 (4096KB) L1 L#2 (32KB) + Core L#2 + PU L#2 (P#4) L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6) Socket L#1 L2 L#2 (4096KB) L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1) L1 L#5 (32KB) +
Core L#5 + PU L#5 (P#3) L2 L#3 (4096KB) L1 L#6 (32KB) + Core L#6 + PU L#6 (P#5) L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7) Thanks. Chan |