Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [hwloc-users] anyone seen problems with PCI on RHEL 6?
From: Guy Streeter (streeter_at_[hidden])
Date: 2012-07-03 10:26:35


On 07/02/2012 06:48 PM, Carl Smith wrote:
> I happened to run "lstopo --of xml" as root on a RHEL6.1 system
> and was surprised with a core dump:

This is almost certainly Red Hat bug 740630,
https://bugzilla.redhat.com/show_bug.cgi?id=740630

I reported it to the hwloc-devel list last November. It has been fixed in Red
Hat Enterprise Linux 6.2. The libpci package version containing the fix is
pciutils-3.1.4-11.el6
I don't see an indication that the fix is scheduled for a back-port to RHEL
6.1. If you need a 6.1 fix, please contact your support or sales
representative and make a request.

--Guy

>
>
> ...
> Looking for PCI devices
>
> Scanning PCI buses...
> ...
> 0000:00:01.0 0604 8086:3408 Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1
> 0000:00:00.0 0600 8086:3405 Intel Corporation 5520/5500/X58 I/O Hub to ESI Port
> *** glibc detected *** /tmp/hwloc-1.4.2/bin/lstopo: double free or corruption (!prev): 0x0000000001bfed50 ***
>
>
> which gdb reported as occurring here:
>
>
> (gdb) bt
> #0 0x0000003905e32905 in raise () from /lib64/libc.so.6
> #1 0x0000003905e340e5 in abort () from /lib64/libc.so.6
> #2 0x0000003905e6f827 in __libc_message () from /lib64/libc.so.6
> #3 0x0000003905e75146 in malloc_printerr () from /lib64/libc.so.6
> #4 0x00000035ace06bdf in ?? () from /lib64/libpci.so.3
> #5 0x00000035ace02c9e in pci_free_dev () from /lib64/libpci.so.3
> #6 0x00000035ace029e0 in pci_cleanup () from /lib64/libpci.so.3
> #7 0x00007f14fe211e5f in hwloc_look_libpci (topology=0x156c130)
> at topology-libpci.c:751
> #8 0x00007f14fe1fccbd in hwloc_discover (topology=0x156c130)
> at topology.c:2299
> #9 0x00007f14fe1fdde2 in hwloc_topology_load (topology=0x156c130)
> at topology.c:2831
> #10 0x0000000000405112 in main (argc=1, argv=0x7fffac612d38) at lstopo.c:530
>
>
> Sure enough, when I fed the same command to valgrind, it told me
>
>
> # valgrind --tool=memcheck /tmp/hwloc-1.4.2/bin/lstopo --of xml
> ...
> 0000:00:01.0 0604 8086:3408 Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1
> 0000:00:00.0 0600 8086:3405 Intel Corporation 5520/5500/X58 I/O Hub to ESI Port
> ==6531== Invalid read of size 8
> ==6531== at 0x35ACE06BDF: ??? (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE02C9D: pci_free_dev (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE029DF: pci_cleanup (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x4C2BE5E: hwloc_look_libpci (topology-libpci.c:751)
> ==6531== by 0x4C16CBC: hwloc_discover (topology.c:2299)
> ==6531== by 0x4C17DE1: hwloc_topology_load (topology.c:2831)
> ==6531== by 0x405111: main (lstopo.c:530)
> ==6531== Address 0x4eec440 is 144 bytes inside a block of size 200 free'd
> ==6531== at 0x4A0595D: free (vg_replace_malloc.c:366)
> ==6531== by 0x35ACE06BDE: ??? (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE02C9D: pci_free_dev (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE029DF: pci_cleanup (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x4C2BE5E: hwloc_look_libpci (topology-libpci.c:751)
> ==6531== by 0x4C16CBC: hwloc_discover (topology.c:2299)
> ==6531== by 0x4C17DE1: hwloc_topology_load (topology.c:2831)
> ==6531== by 0x405111: main (lstopo.c:530)
> ==6531==
> ==6531== Invalid write of size 8
> ==6531== at 0x35ACE06BD0: ??? (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE02C9D: pci_free_dev (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE029DF: pci_cleanup (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x4C2BE5E: hwloc_look_libpci (topology-libpci.c:751)
> ==6531== by 0x4C16CBC: hwloc_discover (topology.c:2299)
> ==6531== by 0x4C17DE1: hwloc_topology_load (topology.c:2831)
> ==6531== by 0x405111: main (lstopo.c:530)
> ==6531== Address 0x4eec440 is 144 bytes inside a block of size 200 free'd
> ==6531== at 0x4A0595D: free (vg_replace_malloc.c:366)
> ==6531== by 0x35ACE06BDE: ??? (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE02C9D: pci_free_dev (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE029DF: pci_cleanup (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x4C2BE5E: hwloc_look_libpci (topology-libpci.c:751)
> ==6531== by 0x4C16CBC: hwloc_discover (topology.c:2299)
> ==6531== by 0x4C17DE1: hwloc_topology_load (topology.c:2831)
> ==6531== by 0x405111: main (lstopo.c:530)
> ==6531==
> ==6531== Invalid free() / delete / delete[]
> ==6531== at 0x4A0595D: free (vg_replace_malloc.c:366)
> ==6531== by 0x35ACE06BDE: ??? (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE02C9D: pci_free_dev (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE029DF: pci_cleanup (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x4C2BE5E: hwloc_look_libpci (topology-libpci.c:751)
> ==6531== by 0x4C16CBC: hwloc_discover (topology.c:2299)
> ==6531== by 0x4C17DE1: hwloc_topology_load (topology.c:2831)
> ==6531== by 0x405111: main (lstopo.c:530)
> ==6531== Address 0x4eec3b0 is 0 bytes inside a block of size 200 free'd
> ==6531== at 0x4A0595D: free (vg_replace_malloc.c:366)
> ==6531== by 0x35ACE06BDE: ??? (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE02C9D: pci_free_dev (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x35ACE029DF: pci_cleanup (in /lib64/libpci.so.3.1.4)
> ==6531== by 0x4C2BE5E: hwloc_look_libpci (topology-libpci.c:751)
> ==6531== by 0x4C16CBC: hwloc_discover (topology.c:2299)
> ==6531== by 0x4C17DE1: hwloc_topology_load (topology.c:2831)
> ==6531== by 0x405111: main (lstopo.c:530)
>
>
> This seems much more like a libpci problem than an hwloc one
> but I don't recall seeing it mentioned before, nor did I see anything
> obvious looking for "pci core" in the report tracking system. Did I
> miss it?
>
> Carl
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>