Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] hwloc error in topology.c in OMPI 1.6.5
From: Gus Correa (gus_at_[hidden])
Date: 2014-02-28 15:30:10


On 02/28/2014 03:32 AM, Brice Goglin wrote:
> Le 28/02/2014 02:48, Ralph Castain a écrit :
>> Remember, hwloc doesn't actually "sense" hardware - it just parses files in the /proc area. So if something is garbled in those files, hwloc will report errors. Doesn't mean anything is wrong with the hardware at all.
>
> For the record, that's not really true:
>
> hwloc looks at /sys (and a bit /proc files), but it also uses cpuid
> instructions. 90% of the times, the former is better because the kernel
> already took care of cleaning up the hardware mess and reporting
> useful/correct info in /proc and /sys. Sometimes the kernel is too old
> and it misses some hardware quirks (like L1i sharing on Gus' machine)
> causing /sys files to be incompatible.
>

Hi Brice

The (pdf) output of lstopo shows one L1d (16k) for each core,
and one L1i (64k) for each *pair* of cores.
Is this wrong?
Anything else wrong that reported by by

Sorry for my ignorance of the specifics of the AMD cache structure.
BTW, if there are any helpful web links, or references, or graphs
about the AMD cache structure, I would love to know.

> In the end, the vast majority of problems come from buggy BIOS, and
> these cause both cpuid and kernel to report invalid info. Aside of
> upgrading the BIOS, the only solution there is to replace the topology
> with a correct XML one.
>
> Brice
>

I am a bit skeptical that the BIOS is the culprit because I replaced
two motherboards (node14 and node16), and only node14 doesn't pass
the hwloc-gather-topology test.
Just in case, I attach the diagnostic for node16 also,
if you want to take a look. :)

FYI, the two new motherboards (nodes 14 and 16)
have a *newer* BIOS version (AMI, version 3.5, 11/25/2013)
then the one in the
original nodes (node15 below) (AMI, version 3.0, 08/31/2012).
I even thought of upgrading the old nodes' BIOSes ...
... but now I am not so sure about this ... :(

New motherboards:

[root_at_node14 ~]# dmidecode -s bios-vendor
American Megatrends Inc.
[root_at_node14 ~]# dmidecode -s bios-version
3.5
[root_at_node14 ~]# dmidecode -s bios-release-date
11/25/2013

**

[root_at_node16 ~]# dmidecode -s bios-vendor
American Megatrends Inc.
[root_at_node16 ~]# dmidecode -s bios-version
3.5
[root_at_node16 ~]# dmidecode -s bios-release-date
11/25/2013

**

Original motherboard:

[root_at_node15 ~]# dmidecode -s bios-vendor
American Megatrends Inc.
[root_at_node15 ~]# dmidecode -s bios-version
3.0
[root_at_node15 ~]# dmidecode -s bios-release-date
08/31/2012

**

Thanks again for your help and advice.

Gus Correa

> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users