Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] misleading cache size on AMD Opteron 6348?
From: Yury Vorobyov (teupollam_at_[hidden])
Date: 2014-06-11 15:16:25


I do not see big difference... This time I used upstream version of hwloc
(not git live).

$ lstopo
****************************************************************************
* hwloc has encountered what looks like an error from the operating system.
*
* L3 (P#6 cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset
0x0000003f) without inclusion!
* Error occurred in topology.c line 940
*
* Please report this error message to the hwloc user's mailing list,
* along with the output from the hwloc-gather-topology script.
****************************************************************************
Machine
  Socket L#0
    NUMANode L#0 (P#0)
      L3 L#0 (6144KB)
        L2 L#0 (2048KB) + L1i L#0 (64KB)
          L1d L#0 (16KB) + Core L#0 + PU L#0 (P#0)
          L1d L#1 (16KB) + Core L#1 + PU L#1 (P#1)
        L2 L#1 (2048KB) + L1i L#1 (64KB)
          L1d L#2 (16KB) + Core L#2 + PU L#2 (P#2)
          L1d L#3 (16KB) + Core L#3 + PU L#3 (P#3)
      L2 L#2 (2048KB) + L1i L#2 (64KB)
        L1d L#4 (16KB) + Core L#4 + PU L#4 (P#4)
        L1d L#5 (16KB) + Core L#5 + PU L#5 (P#5)
    NUMANode L#1 (P#1)
      L2 L#3 (2048KB) + L1i L#3 (64KB)
        L1d L#6 (16KB) + Core L#6 + PU L#6 (P#6)
        L1d L#7 (16KB) + Core L#7 + PU L#7 (P#7)
      L2 L#4 (2048KB) + L1i L#4 (64KB)
        L1d L#8 (16KB) + Core L#8 + PU L#8 (P#8)
        L1d L#9 (16KB) + Core L#9 + PU L#9 (P#9)
      L3 L#1 (6144KB) + L2 L#5 (2048KB) + L1i L#5 (64KB)
        L1d L#10 (16KB) + Core L#10 + PU L#10 (P#10)
        L1d L#11 (16KB) + Core L#11 + PU L#11 (P#11)
  Socket L#1
    NUMANode L#2 (P#2)
      L3 L#2 (6144KB) + L2 L#6 (2048KB) + L1i L#6 (64KB)
        L1d L#12 (16KB) + Core L#12 + PU L#12 (P#12)
        L1d L#13 (16KB) + Core L#13 + PU L#13 (P#13)
      L2 L#7 (2048KB) + L1i L#7 (64KB)
        L1d L#14 (16KB) + Core L#14 + PU L#14 (P#14)
        L1d L#15 (16KB) + Core L#15 + PU L#15 (P#15)
      L2 L#8 (2048KB) + L1i L#8 (64KB)
        L1d L#16 (16KB) + Core L#16 + PU L#16 (P#16)
        L1d L#17 (16KB) + Core L#17 + PU L#17 (P#17)
    NUMANode L#3 (P#3)
      L2 L#9 (2048KB) + L1i L#9 (64KB)
        L1d L#18 (16KB) + Core L#18 + PU L#18 (P#18)
        L1d L#19 (16KB) + Core L#19 + PU L#19 (P#19)
      L3 L#3 (6144KB)
        L2 L#10 (2048KB) + L1i L#10 (64KB)
          L1d L#20 (16KB) + Core L#20 + PU L#20 (P#20)
          L1d L#21 (16KB) + Core L#21 + PU L#21 (P#21)
        L2 L#11 (2048KB) + L1i L#11 (64KB)
          L1d L#22 (16KB) + Core L#22 + PU L#22 (P#22)
          L1d L#23 (16KB) + Core L#23 + PU L#23 (P#23)
  HostBridge L#0
    PCIBridge
      PCI 10de:0f00
    PCIBridge
      PCI 8086:10d3
    PCIBridge
      PCI 8086:10d3
    PCIBridge
      PCI 1002:6889
    PCI 1002:4390
    PCI 1002:439c

On Tue, Apr 1, 2014 at 1:47 PM, Yury Vorobyov <teupollam_at_[hidden]> wrote:

> Current BIOS version could be improperly detecting CPUs, which engineering
> samples of 6348 (all characteristics are same).
>
>
> On Tue, Apr 1, 2014 at 6:59 PM, Yury Vorobyov <teupollam_at_[hidden]> wrote:
>
>> The BIOS has latest version. If I should check some BIOS information, I
>> have access to hardware. Tell me what variables from SMBIOS you want to see?
>>
>>
>> On Fri, Jan 31, 2014 at 1:07 PM, Brice Goglin <Brice.Goglin_at_[hidden]>
>> wrote:
>>
>>> Hello,
>>>
>>> Your BIOS reports invalid L3 cache information. On these processors, the
>>> L3 is shared by 6 cores, it covers 6 cores of an entire half-socket NUMA
>>> node. But the BIOS says that some L3 are shared between 4 cores, others by
>>> 6 cores. And worse it says that some L3 is shared by some cores from a NUMA
>>> node and others from another NUMA nodes, which causes the error message
>>> (and these L3 cannot be inserted in the topology).
>>>
>>> I see "AMD Eng Sample, ZS268145TCG54_32/26/20_2/16" in the processor
>>> type, so it might explain why your BIOS is somehow experimental. See if you
>>> can upgrade it.
>>>
>>> Also make sure your kernel isn't too old in case it misses L3 info for
>>> these processors. At least 3.3 should be OK iirc.
>>>
>>> NUMA node sharing info:
>>> $ cat sys/devices/system/node/node*/cpumap
>>> 00000000,0000003f
>>> 00000000,00000fc0
>>> 00000000,0003f000
>>> 00000000,00fc0000
>>> $ cat sys/devices/system/cpu/cpu{?,??}/cache/index3/shared_cpu_map
>>> 00000000,0000000f << wrong, should be 003f
>>> 00000000,0000000f << wrong, should be 003f
>>> 00000000,0000000f << wrong, should be 003f
>>> 00000000,0000000f << wrong, should be 003f
>>> 00000000,000003f0 <<impossible, should be 003f
>>> 00000000,000003f0 <<impossible, should be 003f
>>> 00000000,000003f0 <<impossible, should be 0fc0
>>> 00000000,000003f0 <<impossible, should be 0fc0
>>> 00000000,000003f0 <<impossible, should be 0fc0
>>> 00000000,000003f0 <<impossible, should be 0fc0
>>> 00000000,00000c00 <<wrong, should be 0fc0
>>> 00000000,00000c00 <<wrong, should be 0fc0
>>> 00000000,00003000 <<wrong, should be 003f000
>>> 00000000,00003000 <<wrong, should be 003f000
>>> 00000000,000fc000 <<impossible, should be 003f000
>>> 00000000,000fc000 <<impossible, should be 003f000
>>> 00000000,000fc000 <<impossible, should be 003f000
>>> 00000000,000fc000 <<impossible, should be 003f000
>>> 00000000,000fc000 <<impossible, should be 0fc0000
>>> 00000000,000fc000 <<impossible, should be 0fc0000
>>> 00000000,00f00000 <<wrong, should be 0fc0000
>>> 00000000,00f00000 <<wrong, should be 0fc0000
>>> 00000000,00f00000 <<wrong, should be 0fc0000
>>> 00000000,00f00000 <<wrong, should be 0fc0000
>>>
>>> Brice
>>>
>>>
>>>
>>> Le 31/01/2014 03:46, Yury Vorobyov a écrit :
>>>
>>> I have got error about "intersecting caches".
>>>
>>> Info from hwloc in attachments.
>>>
>>> I never got this before. I use "live" builds of OpenMPI directly from
>>> repo.
>>>
>>>
>>> _______________________________________________
>>> hwloc-users mailing listhwloc-users_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>>
>>>
>>>
>>> _______________________________________________
>>> hwloc-users mailing list
>>> hwloc-users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>>
>>
>>
>