Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] process binding to NUMA node on Opteron 6xxx series CPUs?
From: Oliver Weihe (weihe_at_[hidden])
Date: 2013-02-14 11:54:06


Hi,

is it possible to bind MPI processes to a NUMA node somehow on Opteron
6xxx series CPUs (e.g. --bind-to-NUMAnode) *without* the usage of a
rankfile?
Opteron 6xxx have two NUMA nodes per CPU(-socket) so --bind-to-socket
doesn't work as I want.

This is a 4 socket Opteron 6344 (12 CPUs per CPU(-socket)):

root_at_node01:~> numactl --hardware | grep cpus
node 0 cpus: 0 1 2 3 4 5
node 1 cpus: 6 7 8 9 10 11
node 2 cpus: 12 13 14 15 16 17
node 3 cpus: 18 19 20 21 22 23
node 4 cpus: 24 25 26 27 28 29
node 5 cpus: 30 31 32 33 34 35
node 6 cpus: 36 37 38 39 40 41
node 7 cpus: 42 43 44 45 46 47

root_at_node01:~> /opt/openmpi/1.6.3/gcc/bin/mpirun --report-bindings -np 8
--bind-to-socket --bysocket sleep 1s
[node01.cluster:21446] MCW rank 1 bound to socket 1[core 0-11]: [. . . .
. . . . . . . .][B B B B B B B B B B B B][. . . . . . . . . . . .][. . .
. . . . . . . . .]
[node01.cluster:21446] MCW rank 2 bound to socket 2[core 0-11]: [. . . .
. . . . . . . .][. . . . . . . . . . . .][B B B B B B B B B B B B][. . .
. . . . . . . . .]
[node01.cluster:21446] MCW rank 3 bound to socket 3[core 0-11]: [. . . .
. . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][B B B
B B B B B B B B B]
[node01.cluster:21446] MCW rank 4 bound to socket 0[core 0-11]: [B B B B
B B B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .][. . .
. . . . . . . . .]
[node01.cluster:21446] MCW rank 5 bound to socket 1[core 0-11]: [. . . .
. . . . . . . .][B B B B B B B B B B B B][. . . . . . . . . . . .][. . .
. . . . . . . . .]
[node01.cluster:21446] MCW rank 6 bound to socket 2[core 0-11]: [. . . .
. . . . . . . .][. . . . . . . . . . . .][B B B B B B B B B B B B][. . .
. . . . . . . . .]
[node01.cluster:21446] MCW rank 7 bound to socket 3[core 0-11]: [. . . .
. . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][B B B
B B B B B B B B B]
[node01.cluster:21446] MCW rank 0 bound to socket 0[core 0-11]: [B B B B
B B B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .][. . .
. . . . . . . . .]

So each process is bound to *two* NUMA nodes, but I wan't to bind to
*one* NUMA node.

What I want is more like this:
root_at_node01:~> cat rankfile
rank 0=localhost slot=0-5
rank 1=localhost slot=6-11
rank 2=localhost slot=12-17
rank 3=localhost slot=18-23
rank 4=localhost slot=24-29
rank 5=localhost slot=30-35
rank 6=localhost slot=36-41
rank 7=localhost slot=42-47
root_at_node01:~> /opt/openmpi/1.6.3/gcc/bin/mpirun --report-bindings -np 8
--rankfile rankfile sleep 1s
[node01.cluster:21505] MCW rank 1 bound to socket 0[core 6-11]: [. . . .
. . B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .][. . .
. . . . . . . . .] (slot list 6-11)
[node01.cluster:21505] MCW rank 2 bound to socket 1[core 0-5]: [. . . .
. . . . . . . .][B B B B B B . . . . . .][. . . . . . . . . . . .][. . .
. . . . . . . . .] (slot list 12-17)
[node01.cluster:21505] MCW rank 3 bound to socket 1[core 6-11]: [. . . .
. . . . . . . .][. . . . . . B B B B B B][. . . . . . . . . . . .][. . .
. . . . . . . . .] (slot list 18-23)
[node01.cluster:21505] MCW rank 4 bound to socket 2[core 0-5]: [. . . .
. . . . . . . .][. . . . . . . . . . . .][B B B B B B . . . . . .][. . .
. . . . . . . . .] (slot list 24-29)
[node01.cluster:21505] MCW rank 5 bound to socket 2[core 6-11]: [. . . .
. . . . . . . .][. . . . . . . . . . . .][. . . . . . B B B B B B][. . .
. . . . . . . . .] (slot list 30-35)
[node01.cluster:21505] MCW rank 6 bound to socket 3[core 0-5]: [. . . .
. . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][B B B
B B B . . . . . .] (slot list 36-41)
[node01.cluster:21505] MCW rank 7 bound to socket 3[core 6-11]: [. . . .
. . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][. . .
. . . B B B B B B] (slot list 42-47)
[node01.cluster:21505] MCW rank 0 bound to socket 0[core 0-5]: [B B B B
B B . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][. . .
. . . . . . . . .] (slot list 0-5)

Actually I'm dreaming of
mpirun --bind-to-NUMAnode --bycore ...
or
mpirun --bind-to-NUMAnode --byNUMAnode ...

Is there any workaround execpt rankfiles for this?

Regards,
  Oliver Weihe