Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] divide by zero error?
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2014-04-29 01:36:51


Please run "hwloc-gather-topology simics" and send the resulting
simics.tar.bz2 that it will create. However, I assume that the simulator
returns buggy x86 cpuid information, so we'll see if we want/can easily
workaround the bug or just let simics developers fix it.
Brice

Le 29/04/2014 01:15, Friedley, Andrew a écrit :
> Hi,
>
> I ran into a problem when running OMPI v1.8.1 -- a divide by zero crash deep in the hwloc code called by OMPI. The system I'm running is a simics x86_64 emulator and RHEL 6.3. I can reproduce the error running lstopo from hwloc v1.9:
>
> [root_at_viper0 bin]# LD_LIBRARY_PATH=/root/hwloc/lib ./lstopo -v
> Floating point exception (core dumped)
>
>
> Hwloc v1.1rc6, already installed on the system, and a corresponding OMPI 1.6.5 build, works with no problems:
>
> [root_at_viper0 bin]# lstopo --version
> lstopo 1.1rc6
> [root_at_viper0 bin]# lstopo -v
> Machine (P#0 local=2055580KB total=2055580KB DMIProductName=Bochs DMIProductVersion= DMIProductSerial= DMIChassisVendor=Bochs DMIChassisType=1 DMIChassisVersion= DMIChassisSerial= DMIChassisAssetTag= DMIBIOSVendor=Bochs DMIBIOSVersion=Bochs DMIBIOSDate=01/01/2007 DMIS)
> Socket L#0 (P#0)
> L3Cache L#0 (8192KB line=64)
> L2Cache L#0 (256KB line=64)
> L1Cache L#0 (32KB line=64)
> Core L#0 (P#0)
> PU L#0 (P#0)
> depth 0: 1 Machine (type #1)
> depth 1: 1 Socket (type #3)
> depth 2: 1 Cache (type #4)
> depth 3: 1 Cache (type #4)
> depth 4: 1 Cache (type #4)
> depth 5: 1 Core (type #5)
> depth 6: 1 PU (type #6)
>
>
> Here's the output from a GDB session on hwloc v1.9:
>
> [root_at_viper0 bin]# LD_LIBRARY_PATH=/root/hwloc/lib gdb ./lstopo
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /root/hwloc/bin/lstopo...done.
> (gdb) r -v
> Starting program: /root/hwloc/bin/lstopo -v
> warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffd000
>
> Program received signal SIGFPE, Arithmetic exception.
> 0x00007ffff7df0558 in look_proc (infos=0x61b6a0, highest_cpuid=11, highest_ext_cpuid=<value optimized out>, features=<value optimized out>, cpuid_type=intel)
> at topology-x86.c:323
> 323 infos->threadid = infos->logprocid % infos->max_nbthreads;
> Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64
> (gdb) bt
> #0 0x00007ffff7df0558 in look_proc (infos=0x61b6a0, highest_cpuid=11, highest_ext_cpuid=<value optimized out>, features=<value optimized out>,
> cpuid_type=intel) at topology-x86.c:323
> #1 0x00007ffff7df165a in look_procs (topology=0x619100, nbprocs=1, fulldiscovery=0) at topology-x86.c:741
> #2 hwloc_look_x86 (topology=0x619100, nbprocs=1, fulldiscovery=0) at topology-x86.c:886
> #3 0x00007ffff7df17f9 in hwloc_x86_discover (backend=<value optimized out>) at topology-x86.c:934
> #4 0x00007ffff7dd6568 in hwloc_discover (topology=0x619100) at topology.c:2452
> #5 hwloc_topology_load (topology=0x619100) at topology.c:2925
> #6 0x0000000000402cf0 in main (argc=<value optimized out>, argv=<value optimized out>) at lstopo.c:581
> (gdb) print infos->logprocid
> $1 = 0
> (gdb) print infos->max_nbthreads
> $2 = 0
>
>
> Any ideas? Any other info I should provide?
>
> Thanks,
>
> Andrew
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users