Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: [hwloc-users] divide by zero error?
From: Friedley, Andrew (andrew.friedley_at_[hidden])
Date: 2014-04-28 19:15:53


Hi,

I ran into a problem when running OMPI v1.8.1 -- a divide by zero crash deep in the hwloc code called by OMPI. The system I'm running is a simics x86_64 emulator and RHEL 6.3. I can reproduce the error running lstopo from hwloc v1.9:

[root_at_viper0 bin]# LD_LIBRARY_PATH=/root/hwloc/lib ./lstopo -v
Floating point exception (core dumped)

Hwloc v1.1rc6, already installed on the system, and a corresponding OMPI 1.6.5 build, works with no problems:

[root_at_viper0 bin]# lstopo --version
lstopo 1.1rc6
[root_at_viper0 bin]# lstopo -v
Machine (P#0 local=2055580KB total=2055580KB DMIProductName=Bochs DMIProductVersion= DMIProductSerial= DMIChassisVendor=Bochs DMIChassisType=1 DMIChassisVersion= DMIChassisSerial= DMIChassisAssetTag= DMIBIOSVendor=Bochs DMIBIOSVersion=Bochs DMIBIOSDate=01/01/2007 DMIS)
  Socket L#0 (P#0)
    L3Cache L#0 (8192KB line=64)
      L2Cache L#0 (256KB line=64)
        L1Cache L#0 (32KB line=64)
          Core L#0 (P#0)
            PU L#0 (P#0)
depth 0: 1 Machine (type #1)
 depth 1: 1 Socket (type #3)
  depth 2: 1 Cache (type #4)
   depth 3: 1 Cache (type #4)
    depth 4: 1 Cache (type #4)
     depth 5: 1 Core (type #5)
      depth 6: 1 PU (type #6)

Here's the output from a GDB session on hwloc v1.9:

[root_at_viper0 bin]# LD_LIBRARY_PATH=/root/hwloc/lib gdb ./lstopo
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/hwloc/bin/lstopo...done.
(gdb) r -v
Starting program: /root/hwloc/bin/lstopo -v
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffd000

Program received signal SIGFPE, Arithmetic exception.
0x00007ffff7df0558 in look_proc (infos=0x61b6a0, highest_cpuid=11, highest_ext_cpuid=<value optimized out>, features=<value optimized out>, cpuid_type=intel)
    at topology-x86.c:323
323 infos->threadid = infos->logprocid % infos->max_nbthreads;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64
(gdb) bt
#0 0x00007ffff7df0558 in look_proc (infos=0x61b6a0, highest_cpuid=11, highest_ext_cpuid=<value optimized out>, features=<value optimized out>,
    cpuid_type=intel) at topology-x86.c:323
#1 0x00007ffff7df165a in look_procs (topology=0x619100, nbprocs=1, fulldiscovery=0) at topology-x86.c:741
#2 hwloc_look_x86 (topology=0x619100, nbprocs=1, fulldiscovery=0) at topology-x86.c:886
#3 0x00007ffff7df17f9 in hwloc_x86_discover (backend=<value optimized out>) at topology-x86.c:934
#4 0x00007ffff7dd6568 in hwloc_discover (topology=0x619100) at topology.c:2452
#5 hwloc_topology_load (topology=0x619100) at topology.c:2925
#6 0x0000000000402cf0 in main (argc=<value optimized out>, argv=<value optimized out>) at lstopo.c:581
 (gdb) print infos->logprocid
$1 = 0
(gdb) print infos->max_nbthreads
$2 = 0

Any ideas? Any other info I should provide?

Thanks,

Andrew