Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-12-19 01:54:21


Yeah, it will impact everything that uses hwloc topology maps, I fear.

One side note: you'll need to add --hetero-nodes to your cmd line. If we don't see that, we assume that all the node topologies are identical - which clearly isn't true here.

I'll try to resolve the hier inversion over the holiday - won't be for 1.7.4, but hopefully for 1.7.5

Thanks
Ralph

On Dec 18, 2013, at 9:44 PM, tmishima_at_[hidden] wrote:

>
>
> I think it's normal for AMD opteron having 8/16 cores such as
> magny cours or interlagos. Because it usually has 2 numa nodes
> in a cpu(socket), numa-node can not include a socket. This type
> of hierarchy would be natural.
>
> (node03 is Dell PowerEdge R815 and maybe quite common, I guess)
>
> By the way, I think this inversion should affect rmaps_lama mapping.
>
> Tetsuya Mishima
>
>> Ick - yeah, that would be a problem. I haven't seen that type of
> hierarchical inversion before - is node03 a different type of chip?
>>
>> Might take awhile for me to adjust the code to handle hier
> inversion... :-(
>>
>> On Dec 18, 2013, at 9:05 PM, tmishima_at_[hidden] wrote:
>>
>>>
>>>
>>> Hi Ralph,
>>>
>>> I found the reason. I attached the main part of output with 32
>>> core node(node03) and 8 core node(node05) at the bottom.
>>>
>>> From this information, socket of node03 includes numa-node.
>>> On the other hand, numa-node of node05 includes socket.
>>> The direction of object tree is opposite.
>>>
>>> Since "-map-by socket" may be assumed as default,
>>> for node05, "-bind-to numa and -map-by socket" means
>>> upward search. For node03, this should be downward.
>>>
>>> I guess that openmpi-1.7.4rc1 will always assume numa-node
>>> includes socket. Is it right? Then, upward search is assumed
>>> in orte_rmaps_base_compute_bindings even for node03 when I
>>> put "-bind-to numa and -map-by socket" option.
>>>
>>> [node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
>>> [node03.cluster:15508] mca:rmaps: compute bindings for job [38286,1]
> with
>>> policy NUMA
>>> [node03.cluster:15508] mca:rmaps: bind upwards for job [38286,1] with
>>> bindings NUMA
>>> [node03.cluster:15508] [[38286,0],0] bind:upward target NUMANode type
>>> Machine
>>>
>>> That's the reason of this trouble. Therefore, adding "-map-by core"
> works.
>>> (mapping pattern seems to be strange ...)
>>>
>>> [mishima_at_node03 demos]$ mpirun -np 8 -bind-to numa -map-by core
>>> -report-bindings myprog
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] MCW rank 2 bound to socket 0[core 0[hwt 0]],
> socket
>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
>>> cket 0[core 3[hwt 0]]:
>>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
>>> [node03.cluster:15885] MCW rank 3 bound to socket 0[core 0[hwt 0]],
> socket
>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
>>> cket 0[core 3[hwt 0]]:
>>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
>>> [node03.cluster:15885] MCW rank 4 bound to socket 0[core 4[hwt 0]],
> socket
>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
>>> cket 0[core 7[hwt 0]]:
>>> [././././B/B/B/B][./././././././.][./././././././.][./././././././.]
>>> [node03.cluster:15885] MCW rank 5 bound to socket 0[core 4[hwt 0]],
> socket
>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
>>> cket 0[core 7[hwt 0]]:
>>> [././././B/B/B/B][./././././././.][./././././././.][./././././././.]
>>> [node03.cluster:15885] MCW rank 6 bound to socket 0[core 4[hwt 0]],
> socket
>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
>>> cket 0[core 7[hwt 0]]:
>>> [././././B/B/B/B][./././././././.][./././././././.][./././././././.]
>>> [node03.cluster:15885] MCW rank 7 bound to socket 0[core 4[hwt 0]],
> socket
>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
>>> cket 0[core 7[hwt 0]]:
>>> [././././B/B/B/B][./././././././.][./././././././.][./././././././.]
>>> [node03.cluster:15885] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> socket
>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
>>> cket 0[core 3[hwt 0]]:
>>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
>>> [node03.cluster:15885] MCW rank 1 bound to socket 0[core 0[hwt 0]],
> socket
>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
>>> cket 0[core 3[hwt 0]]:
>>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
>>> Hello world from process 6 of 8
>>> Hello world from process 5 of 8
>>> Hello world from process 0 of 8
>>> Hello world from process 7 of 8
>>> Hello world from process 3 of 8
>>> Hello world from process 4 of 8
>>> Hello world from process 2 of 8
>>> Hello world from process 1 of 8
>>>
>>> Regards,
>>> Tetsuya Mishima
>>>
>>> [node03.cluster:15508] Type: Machine Number of child objects: 4
>>> Name=NULL
>>> total=132358820KB
>>> Backend=Linux
>>> OSName=Linux
>>> OSRelease=2.6.18-308.16.1.el5
>>> OSVersion="#1 SMP Tue Oct 2 22:01:43 EDT 2012"
>>> Architecture=x86_64
>>> Cpuset: 0xffffffff
>>> Online: 0xffffffff
>>> Allowed: 0xffffffff
>>> Bind CPU proc: TRUE
>>> Bind CPU thread: TRUE
>>> Bind MEM proc: FALSE
>>> Bind MEM thread: TRUE
>>> Type: Socket Number of child objects: 2
>>> Name=NULL
>>> total=33071780KB
>>> CPUModel="AMD Opteron(tm) Processor 6136"
>>> Cpuset: 0x000000ff
>>> Online: 0x000000ff
>>> Allowed: 0x000000ff
>>> Type: NUMANode Number of child objects: 1
>>>
>>>
>>> [node05.cluster:21750] Type: Machine Number of child objects: 2
>>> Name=NULL
>>> total=33080072KB
>>> Backend=Linux
>>> OSName=Linux
>>> OSRelease=2.6.18-308.16.1.el5
>>> OSVersion="#1 SMP Tue Oct 2 22:01:43 EDT 2012"
>>> Architecture=x86_64
>>> Cpuset: 0x000000ff
>>> Online: 0x000000ff
>>> Allowed: 0x000000ff
>>> Bind CPU proc: TRUE
>>> Bind CPU thread: TRUE
>>> Bind MEM proc: FALSE
>>> Bind MEM thread: TRUE
>>> Type: NUMANode Number of child objects: 1
>>> Name=NULL
>>> local=16532232KB
>>> total=16532232KB
>>> Cpuset: 0x0000000f
>>> Online: 0x0000000f
>>> Allowed: 0x0000000f
>>> Type: Socket Number of child objects: 1
>>>
>>>
>>>> Hmm...try adding "-mca rmaps_base_verbose 10 -mca ess_base_verbose 5"
> to
>>> your cmd line and let's see what it thinks it found.
>>>>
>>>>
>>>> On Dec 18, 2013, at 6:55 PM, tmishima_at_[hidden] wrote:
>>>>
>>>>>
>>>>>
>>>>> Hi, I report one more problem with openmpi-1.7.4rc1,
>>>>> which is more serious.
>>>>>
>>>>> For our 32 core nodes(AMD magny cours based) which has
>>>>> 8 numa-nodes, "-bind-to numa" does not work. Without
>>>>> this option, it works. For your infomation, at the
>>>>> bottom of this mail, I added the lstopo information
>>>>> of the node.
>>>>>
>>>>> Regards,
>>>>> Tetsuya Mishima
>>>>>
>>>>> [mishima_at_manage ~]$ qsub -I -l nodes=1:ppn=32
>>>>> qsub: waiting for job 8352.manage.cluster to start
>>>>> qsub: job 8352.manage.cluster ready
>>>>>
>>>>> [mishima_at_node03 demos]$ mpirun -np 8 -report-bindings -bind-to numa
>>> myprog
>>>>> [node03.cluster:15316] [[37582,0],0] bind:upward target NUMANode type
>>>>> Machine
>>>>>
>>>
> --------------------------------------------------------------------------
>>>>> A request was made to bind to NUMA, but an appropriate target could
> not
>>>>> be found on node node03.
>>>>>
>>>
> --------------------------------------------------------------------------
>>>>> [mishima_at_node03 ~]$ cd ~/Desktop/openmpi-1.7/demos/
>>>>> [mishima_at_node03 demos]$ mpirun -np 8 -report-bindings myprog
>>>>> [node03.cluster:15282] MCW rank 2 bound to socket 1[core 8[hwt 0]]:
>>>>> [./././././././.][B/././././././.][./././././././.][
>>>>> ./././././././.]
>>>>> [node03.cluster:15282] MCW rank 3 bound to socket 1[core 9[hwt 0]]:
>>>>> [./././././././.][./B/./././././.][./././././././.][
>>>>> ./././././././.]
>>>>> [node03.cluster:15282] MCW rank 4 bound to socket 2[core 16[hwt 0]]:
>>>>> [./././././././.][./././././././.][B/././././././.]
>>>>> [./././././././.]
>>>>> [node03.cluster:15282] MCW rank 5 bound to socket 2[core 17[hwt 0]]:
>>>>> [./././././././.][./././././././.][./B/./././././.]
>>>>> [./././././././.]
>>>>> [node03.cluster:15282] MCW rank 6 bound to socket 3[core 24[hwt 0]]:
>>>>> [./././././././.][./././././././.][./././././././.]
>>>>> [B/././././././.]
>>>>> [node03.cluster:15282] MCW rank 7 bound to socket 3[core 25[hwt 0]]:
>>>>> [./././././././.][./././././././.][./././././././.]
>>>>> [./B/./././././.]
>>>>> [node03.cluster:15282] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
>>>>> [B/././././././.][./././././././.][./././././././.][
>>>>> ./././././././.]
>>>>> [node03.cluster:15282] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
>>>>> [./B/./././././.][./././././././.][./././././././.][
>>>>> ./././././././.]
>>>>> Hello world from process 2 of 8
>>>>> Hello world from process 5 of 8
>>>>> Hello world from process 4 of 8
>>>>> Hello world from process 3 of 8
>>>>> Hello world from process 1 of 8
>>>>> Hello world from process 7 of 8
>>>>> Hello world from process 6 of 8
>>>>> Hello world from process 0 of 8
>>>>> [mishima_at_node03 demos]$ ~/opt/hwloc/bin/lstopo-no-graphics
>>>>> Machine (126GB)
>>>>> Socket L#0 (32GB)
>>>>> NUMANode L#0 (P#0 16GB) + L3 L#0 (5118KB)
>>>>> L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU
>>> L#0
>>>>> (P#0)
>>>>> L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU
>>> L#1
>>>>> (P#1)
>>>>> L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU
>>> L#2
>>>>> (P#2)
>>>>> L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU
>>> L#3
>>>>> (P#3)
>>>>> NUMANode L#1 (P#1 16GB) + L3 L#1 (5118KB)
>>>>> L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU
>>> L#4
>>>>> (P#4)
>>>>> L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU
>>> L#5
>>>>> (P#5)
>>>>> L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU
>>> L#6
>>>>> (P#6)
>>>>> L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU
>>> L#7
>>>>> (P#7)
>>>>> Socket L#1 (32GB)
>>>>> NUMANode L#2 (P#6 16GB) + L3 L#2 (5118KB)
>>>>> L2 L#8 (512KB) + L1d L#8 (64KB) + L1i L#8 (64KB) + Core L#8 + PU
>>> L#8
>>>>> (P#8)
>>>>> L2 L#9 (512KB) + L1d L#9 (64KB) + L1i L#9 (64KB) + Core L#9 + PU
>>> L#9
>>>>> (P#9)
>>>>> L2 L#10 (512KB) + L1d L#10 (64KB) + L1i L#10 (64KB) + Core L#10 +
>>> PU
>>>>> L#10 (P#10)
>>>>> L2 L#11 (512KB) + L1d L#11 (64KB) + L1i L#11 (64KB) + Core L#11 +
>>> PU
>>>>> L#11 (P#11)
>>>>> NUMANode L#3 (P#7 16GB) + L3 L#3 (5118KB)
>>>>> L2 L#12 (512KB) + L1d L#12 (64KB) + L1i L#12 (64KB) + Core L#12 +
>>> PU
>>>>> L#12 (P#12)
>>>>> L2 L#13 (512KB) + L1d L#13 (64KB) + L1i L#13 (64KB) + Core L#13 +
>>> PU
>>>>> L#13 (P#13)
>>>>> L2 L#14 (512KB) + L1d L#14 (64KB) + L1i L#14 (64KB) + Core L#14 +
>>> PU
>>>>> L#14 (P#14)
>>>>> L2 L#15 (512KB) + L1d L#15 (64KB) + L1i L#15 (64KB) + Core L#15 +
>>> PU
>>>>> L#15 (P#15)
>>>>> Socket L#2 (32GB)
>>>>> NUMANode L#4 (P#4 16GB) + L3 L#4 (5118KB)
>>>>> L2 L#16 (512KB) + L1d L#16 (64KB) + L1i L#16 (64KB) + Core L#16 +
>>> PU
>>>>> L#16 (P#16)
>>>>> L2 L#17 (512KB) + L1d L#17 (64KB) + L1i L#17 (64KB) + Core L#17 +
>>> PU
>>>>> L#17 (P#17)
>>>>> L2 L#18 (512KB) + L1d L#18 (64KB) + L1i L#18 (64KB) + Core L#18 +
>>> PU
>>>>> L#18 (P#18)
>>>>> L2 L#19 (512KB) + L1d L#19 (64KB) + L1i L#19 (64KB) + Core L#19 +
>>> PU
>>>>> L#19 (P#19)
>>>>> NUMANode L#5 (P#5 16GB) + L3 L#5 (5118KB)
>>>>> L2 L#20 (512KB) + L1d L#20 (64KB) + L1i L#20 (64KB) + Core L#20 +
>>> PU
>>>>> L#20 (P#20)
>>>>> L2 L#21 (512KB) + L1d L#21 (64KB) + L1i L#21 (64KB) + Core L#21 +
>>> PU
>>>>> L#21 (P#21)
>>>>> L2 L#22 (512KB) + L1d L#22 (64KB) + L1i L#22 (64KB) + Core L#22 +
>>> PU
>>>>> L#22 (P#22)
>>>>> L2 L#23 (512KB) + L1d L#23 (64KB) + L1i L#23 (64KB) + Core L#23 +
>>> PU
>>>>> L#23 (P#23)
>>>>> Socket L#3 (32GB)
>>>>> NUMANode L#6 (P#2 16GB) + L3 L#6 (5118KB)
>>>>> L2 L#24 (512KB) + L1d L#24 (64KB) + L1i L#24 (64KB) + Core L#24 +
>>> PU
>>>>> L#24 (P#24)
>>>>> L2 L#25 (512KB) + L1d L#25 (64KB) + L1i L#25 (64KB) + Core L#25 +
>>> PU
>>>>> L#25 (P#25)
>>>>> L2 L#26 (512KB) + L1d L#26 (64KB) + L1i L#26 (64KB) + Core L#26 +
>>> PU
>>>>> L#26 (P#26)
>>>>> L2 L#27 (512KB) + L1d L#27 (64KB) + L1i L#27 (64KB) + Core L#27 +
>>> PU
>>>>> L#27 (P#27)
>>>>> NUMANode L#7 (P#3 16GB) + L3 L#7 (5118KB)
>>>>> L2 L#28 (512KB) + L1d L#28 (64KB) + L1i L#28 (64KB) + Core L#28 +
>>> PU
>>>>> L#28 (P#28)
>>>>> L2 L#29 (512KB) + L1d L#29 (64KB) + L1i L#29 (64KB) + Core L#29 +
>>> PU
>>>>> L#29 (P#29)
>>>>> L2 L#30 (512KB) + L1d L#30 (64KB) + L1i L#30 (64KB) + Core L#30 +
>>> PU
>>>>> L#30 (P#30)
>>>>> L2 L#31 (512KB) + L1d L#31 (64KB) + L1i L#31 (64KB) + Core L#31 +
>>> PU
>>>>> L#31 (P#31)
>>>>> HostBridge L#0
>>>>> PCIBridge
>>>>> PCI 14e4:1639
>>>>> Net L#0 "eth0"
>>>>> PCI 14e4:1639
>>>>> Net L#1 "eth1"
>>>>> PCIBridge
>>>>> PCI 14e4:1639
>>>>> Net L#2 "eth2"
>>>>> PCI 14e4:1639
>>>>> Net L#3 "eth3"
>>>>> PCIBridge
>>>>> PCIBridge
>>>>> PCIBridge
>>>>> PCI 1000:0072
>>>>> Block L#4 "sdb"
>>>>> Block L#5 "sda"
>>>>> PCI 1002:4390
>>>>> Block L#6 "sr0"
>>>>> PCIBridge
>>>>> PCI 102b:0532
>>>>> HostBridge L#7
>>>>> PCIBridge
>>>>> PCI 15b3:6274
>>>>> Net L#7 "ib0"
>>>>> OpenFabrics L#8 "mthca0"
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]>>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users