Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] anyone seen problems with PCI on RHEL 6?
From: Guy Streeter (streeter_at_[hidden])
Date: 2012-07-03 10:26:35


On 07/02/2012 06:48 PM, Carl Smith wrote:
> I happened to run "lstopo --of xml" as root on a RHEL6.1 system
> and was surprised with a core dump:

This is almost certainly Red Hat bug 740630,
https://bugzilla.redhat.com/show_bug.cgi?id=740630

I reported it to the hwloc-devel list last November. It has been fixed in Red
Hat Enterprise Linux 6.2. The libpci package version containing the fix is
pciutils-3.1.4-11.el6
I don't see an indication that the fix is scheduled for a back-port to RHEL
6.1. If you need a 6.1 fix, please contact your support or sales
representative and make a request.

--Guy

>
>
> ...
> Looking for PCI devices
>
> Scanning PCI buses...
> ...
> 0000:00:01.0 0604 8086:3408 Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1
> 0000:00:00.0 0600 8086:3405 Intel Corporation 5520/5500/X58 I/O Hub to ESI Port
> *** glibc detected *** /tmp/hwloc-1.4.2/bin/lstopo: double free or corruption (!prev): 0x0000000001bfed50 ***
>
>
> which gdb reported as occurring here:
>
>
> (gdb) bt
> #0 0x0000003905e32905 in raise () from /lib64/libc.so.6
> #1 0x0000003905e340e5 in abort () from /lib64/libc.so.6
> #2 0x0000003905e6f827 in __libc_message () from /lib64/libc.so.6
> #3 0x0000003905e75146 in malloc_printerr () from /lib64/libc.so.6
> #4 0x00000035ace06bdf in ?? () from /lib64/libpci.so.3
> #5 0x00000035ace02c9e in pci_free_dev () from /lib64/libpci.so.3
> #6 0x00000035ace029e0 in pci_cleanup () from /lib64/libpci.so.3
> #7 0x00007f14fe211e5f in hwloc_look_libpci (topology=0x156c130)
> at topology-libpci.c:751
> #8 0x00007f14fe1fccbd in hwloc_discover (topology=0x156c130)
> at topology.c:2299
> #9 0x00007f14fe1fdde2 in hwloc_topology_load (topology=0x156c130)
> at topology.c:2831
> #10 0x0000000000405112 in main (argc=1, argv=0x7fffac612d38) at lstopo.c:530
>
>
> Sure enough, when I fed the same command to valgrind, it told me
>
>
> # valgrind --tool=memcheck /tmp/hwloc-1.4.2/bin/lstopo --of xml
> ...
> 0000:00:01.0 0604 8086:3408 Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1
> 0000:00:00.0 0600 8086:3405 Intel Corporation 5520/5500/X58 I/O Hub to ESI Port
> ==6531== Invalid read of size 8
> ==6531== at 0x35ACE06BDF: ??? (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE02C9D: pci_free_dev (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE029DF: pci_cleanup (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x4C2BE5E: hwloc_look_libpci (topology-libpci.c:751)
> ==6531== by 0x4C16CBC: hwloc_discover (topology.c:2299)
> ==6531== by 0x4C17DE1: hwloc_topology_load (topology.c:2831)
> ==6531== by 0x405111: main (lstopo.c:530)
> ==6531== Address 0x4eec440 is 144 bytes inside a block of size 200 free'd
> ==6531== at 0x4A0595D: free (vg_replace_malloc.c:366)
> ==6531== by 0x35ACE06BDE: ??? (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE02C9D: pci_free_dev (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE029DF: pci_cleanup (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x4C2BE5E: hwloc_look_libpci (topology-libpci.c:751)
> ==6531== by 0x4C16CBC: hwloc_discover (topology.c:2299)
> ==6531== by 0x4C17DE1: hwloc_topology_load (topology.c:2831)
> ==6531== by 0x405111: main (lstopo.c:530)
> ==6531==
> ==6531== Invalid write of size 8
> ==6531== at 0x35ACE06BD0: ??? (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE02C9D: pci_free_dev (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE029DF: pci_cleanup (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x4C2BE5E: hwloc_look_libpci (topology-libpci.c:751)
> ==6531== by 0x4C16CBC: hwloc_discover (topology.c:2299)
> ==6531== by 0x4C17DE1: hwloc_topology_load (topology.c:2831)
> ==6531== by 0x405111: main (lstopo.c:530)
> ==6531== Address 0x4eec440 is 144 bytes inside a block of size 200 free'd
> ==6531== at 0x4A0595D: free (vg_replace_malloc.c:366)
> ==6531== by 0x35ACE06BDE: ??? (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE02C9D: pci_free_dev (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE029DF: pci_cleanup (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x4C2BE5E: hwloc_look_libpci (topology-libpci.c:751)
> ==6531== by 0x4C16CBC: hwloc_discover (topology.c:2299)
> ==6531== by 0x4C17DE1: hwloc_topology_load (topology.c:2831)
> ==6531== by 0x405111: main (lstopo.c:530)
> ==6531==
> ==6531== Invalid free() / delete / delete[]
> ==6531== at 0x4A0595D: free (vg_replace_malloc.c:366)
> ==6531== by 0x35ACE06BDE: ??? (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE02C9D: pci_free_dev (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE029DF: pci_cleanup (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x4C2BE5E: hwloc_look_libpci (topology-libpci.c:751)
> ==6531== by 0x4C16CBC: hwloc_discover (topology.c:2299)
> ==6531== by 0x4C17DE1: hwloc_topology_load (topology.c:2831)
> ==6531== by 0x405111: main (lstopo.c:530)
> ==6531== Address 0x4eec3b0 is 0 bytes inside a block of size 200 free'd
> ==6531== at 0x4A0595D: free (vg_replace_malloc.c:366)
> ==6531== by 0x35ACE06BDE: ??? (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE02C9D: pci_free_dev (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE029DF: pci_cleanup (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x4C2BE5E: hwloc_look_libpci (topology-libpci.c:751)
> ==6531== by 0x4C16CBC: hwloc_discover (topology.c:2299)
> ==6531== by 0x4C17DE1: hwloc_topology_load (topology.c:2831)
> ==6531== by 0x405111: main (lstopo.c:530)
>
>
> This seems much more like a libpci problem than an hwloc one
> but I don't recall seeing it mentioned before, nor did I see anything
> obvious looking for "pci core" in the report tracking system. Did I
> miss it?
>
> Carl
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>