Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] process binding to NUMA node on Opteron 6xxx series CPUs?
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-02-14 12:46:32


Sure - use the 1.7 branch or the developer's trunk. We have the --bind-to numa option there.

On Feb 14, 2013, at 8:54 AM, Oliver Weihe <weihe_at_[hidden]> wrote:

> Hi,
>
> is it possible to bind MPI processes to a NUMA node somehow on Opteron 6xxx series CPUs (e.g. --bind-to-NUMAnode) *without* the usage of a rankfile?
> Opteron 6xxx have two NUMA nodes per CPU(-socket) so --bind-to-socket doesn't work as I want.
>
> This is a 4 socket Opteron 6344 (12 CPUs per CPU(-socket)):
>
> root_at_node01:~> numactl --hardware | grep cpus
> node 0 cpus: 0 1 2 3 4 5
> node 1 cpus: 6 7 8 9 10 11
> node 2 cpus: 12 13 14 15 16 17
> node 3 cpus: 18 19 20 21 22 23
> node 4 cpus: 24 25 26 27 28 29
> node 5 cpus: 30 31 32 33 34 35
> node 6 cpus: 36 37 38 39 40 41
> node 7 cpus: 42 43 44 45 46 47
>
> root_at_node01:~> /opt/openmpi/1.6.3/gcc/bin/mpirun --report-bindings -np 8 --bind-to-socket --bysocket sleep 1s
> [node01.cluster:21446] MCW rank 1 bound to socket 1[core 0-11]: [. . . . . . . . . . . .][B B B B B B B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .]
> [node01.cluster:21446] MCW rank 2 bound to socket 2[core 0-11]: [. . . . . . . . . . . .][. . . . . . . . . . . .][B B B B B B B B B B B B][. . . . . . . . . . . .]
> [node01.cluster:21446] MCW rank 3 bound to socket 3[core 0-11]: [. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][B B B B B B B B B B B B]
> [node01.cluster:21446] MCW rank 4 bound to socket 0[core 0-11]: [B B B B B B B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .]
> [node01.cluster:21446] MCW rank 5 bound to socket 1[core 0-11]: [. . . . . . . . . . . .][B B B B B B B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .]
> [node01.cluster:21446] MCW rank 6 bound to socket 2[core 0-11]: [. . . . . . . . . . . .][. . . . . . . . . . . .][B B B B B B B B B B B B][. . . . . . . . . . . .]
> [node01.cluster:21446] MCW rank 7 bound to socket 3[core 0-11]: [. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][B B B B B B B B B B B B]
> [node01.cluster:21446] MCW rank 0 bound to socket 0[core 0-11]: [B B B B B B B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .]
>
> So each process is bound to *two* NUMA nodes, but I wan't to bind to *one* NUMA node.
>
> What I want is more like this:
> root_at_node01:~> cat rankfile
> rank 0=localhost slot=0-5
> rank 1=localhost slot=6-11
> rank 2=localhost slot=12-17
> rank 3=localhost slot=18-23
> rank 4=localhost slot=24-29
> rank 5=localhost slot=30-35
> rank 6=localhost slot=36-41
> rank 7=localhost slot=42-47
> root_at_node01:~> /opt/openmpi/1.6.3/gcc/bin/mpirun --report-bindings -np 8 --rankfile rankfile sleep 1s
> [node01.cluster:21505] MCW rank 1 bound to socket 0[core 6-11]: [. . . . . . B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .] (slot list 6-11)
> [node01.cluster:21505] MCW rank 2 bound to socket 1[core 0-5]: [. . . . . . . . . . . .][B B B B B B . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .] (slot list 12-17)
> [node01.cluster:21505] MCW rank 3 bound to socket 1[core 6-11]: [. . . . . . . . . . . .][. . . . . . B B B B B B][. . . . . . . . . . . .][. . . . . . . . . . . .] (slot list 18-23)
> [node01.cluster:21505] MCW rank 4 bound to socket 2[core 0-5]: [. . . . . . . . . . . .][. . . . . . . . . . . .][B B B B B B . . . . . .][. . . . . . . . . . . .] (slot list 24-29)
> [node01.cluster:21505] MCW rank 5 bound to socket 2[core 6-11]: [. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . B B B B B B][. . . . . . . . . . . .] (slot list 30-35)
> [node01.cluster:21505] MCW rank 6 bound to socket 3[core 0-5]: [. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][B B B B B B . . . . . .] (slot list 36-41)
> [node01.cluster:21505] MCW rank 7 bound to socket 3[core 6-11]: [. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . B B B B B B] (slot list 42-47)
> [node01.cluster:21505] MCW rank 0 bound to socket 0[core 0-5]: [B B B B B B . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .][. . . . . . . . . . . .] (slot list 0-5)
>
>
> Actually I'm dreaming of
> mpirun --bind-to-NUMAnode --bycore ...
> or
> mpirun --bind-to-NUMAnode --byNUMAnode ...
>
> Is there any workaround execpt rankfiles for this?
>
> Regards,
> Oliver Weihe
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users