Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] divide by zero error?
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2014-06-08 03:38:07


I added --disable-cpuid, will be in hwloc v1.10.
Brice

Le 06/05/2014 00:44, Friedley, Andrew a écrit :
> Actually, is there any way to make HWLOC_COMPONENTS=-x86 the default or otherwise disable or compile without the x86 backend, so that I get that behavior by default?
>
> Thanks,
>
> Andrew
>
>> -----Original Message-----
>> From: Brice Goglin [mailto:Brice.Goglin_at_[hidden]]
>> Sent: Monday, May 5, 2014 1:03 PM
>> To: Friedley, Andrew
>> Subject: Re: [hwloc-users] divide by zero error?
>>
>> Thanks.
>> The simulator returns buggy cpuid information. It may be possible to
>> workaround this specific problem, but I am afraid there could be others.
>> I think you should just disable the hwloc x86 backend by setting
>> HWLOC_COMPONENTS=-x86 in the environment. Does this look like an
>> acceptable work-around ?
>> Brice
>>
>>
>>
>> Le 05/05/2014 20:21, Friedley, Andrew a écrit :
>>> Back from vacation -- Is this what you're after?
>>>
>>> [root_at_viper0 bin]# ./lstopo
>>>
>>>
>>> * Topology extraction from /proc/cpuinfo *
>>>
>>> processor 0
>>> found 1 cpu topologies, cpuset 0x00000001 os socket 0 has cpuset
>>> 0x00000001 os core 0 has cpuset 0x00000001 thread 0 has cpuset
>>> 0x00000001 cache depth 0 has cpuset 0x00000001 cache depth 0 has
>>> cpuset 0x00000001 cache depth 1 has cpuset 0x00000001 cache depth 2
>>> has cpuset 0x00000001 found DMIProductName 'Bochs'
>>> found DMIProductVersion ''
>>> found DMIProductSerial ''
>>> found DMIChassisVendor 'Bochs'
>>> found DMIChassisType '1'
>>> found DMIChassisVersion ''
>>> found DMIChassisSerial ''
>>> found DMIChassisAssetTag ''
>>> found DMIBIOSVendor 'Bochs'
>>> found DMIBIOSVersion 'Bochs'
>>> found DMIBIOSDate '01/01/2007'
>>> found DMISysVendor 'Bochs'
>>> Machine#0(local=2055580KB total=0KB DMIProductName=Bochs
>> DMIProductVersion= DMIProductSerial= DMIChassisVendor=Bochs
>> DMIChassisType=1 DMI) cpuset 0xf...f complete 0x00000001 online 0xf...f
>> allowed 0xf...f nodeset 0x0 completeN 0x0 allowedN 0xf...f
>>> Socket#0(CPUVendor=GenuineIntel CPUFamilyNumber=6
>> CPUModelNumber=26 CPUModel="Intel(R) Core(TM) i7 CPU @
>> 2.00GHz") cpuset 0x00000001
>>> L3Cache(size=8192KB linesize=64 ways=16) cpuset 0x00000001
>>> L2Cache(size=256KB linesize=64 ways=8) cpuset 0x00000001
>>> L1dCache(size=32KB linesize=64 ways=8) cpuset 0x00000001
>>> L1iCache(size=32KB linesize=64 ways=4) cpuset 0x00000001
>>> Core#0 cpuset 0x00000001
>>> PU#0 cpuset 0x00000001
>>> Backend x86 forcing a reconnect of levels
>>> --- Socket level has number 1
>>>
>>> --- Cache level depth 3 has number 2
>>>
>>> --- Cache level depth 2 has number 3
>>>
>>> --- Cache level depth 1 has number 4
>>>
>>> --- Cache level depth 1 has number 5
>>>
>>> --- Core level has number 6
>>>
>>> --- PU level has number 7
>>>
>>> highest cpuid b, cpuid type 0
>>> highest extended cpuid 80000008
>>> possible CPUs are 0x00000001
>>> binding to CPU0
>>> APIC ID 0x00 max_log_proc 1
>>> phys 0 thread 0
>>> cache 0 type 1
>>> cache 1 type 2
>>> cache 2 type 3
>>> cache 3 type 3
>>> cache 4 type 0
>>> cache 0 type 1 L1 t2 c8 linesize 64 linepart 1 ways 8 sets 64, size
>>> 32KB thus 0 threads Floating point exception (core dumped)
>>>
>>>> -----Original Message-----
>>>> From: Brice Goglin [mailto:Brice.Goglin_at_[hidden]]
>>>> Sent: Wednesday, April 30, 2014 2:30 AM
>>>> To: Friedley, Andrew
>>>> Subject: Re: [hwloc-users] divide by zero error?
>>>>
>>>> Thanks.
>>>> The Linux backend works well so the bug is indeed in the x86 backend
>> only.
>>>> Could you rebuild with --enable-debug and send the entire
>>>> stdout+stderr output of lstopo ?
>>>>
>>>> Thanks
>>>> Brice
>>>>
>>>>
>>>>
>>>> Le 29/04/2014 17:01, Friedley, Andrew a écrit :
>>>>> Attached, off list.
>>>>>
>>>>> Andrew
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: hwloc-users [mailto:hwloc-users-bounces_at_[hidden]] On
>>>> Behalf
>>>>>> Of Brice Goglin
>>>>>> Sent: Monday, April 28, 2014 10:37 PM
>>>>>> To: hwloc-users_at_[hidden]
>>>>>> Subject: Re: [hwloc-users] divide by zero error?
>>>>>>
>>>>>> Please run "hwloc-gather-topology simics" and send the resulting
>>>>>> simics.tar.bz2 that it will create. However, I assume that the
>>>>>> simulator returns buggy x86 cpuid information, so we'll see if we
>>>>>> want/can easily workaround the bug or just let simics developers fix it.
>>>>>> Brice
>>>>>>
>>>>>>
>>>>>> Le 29/04/2014 01:15, Friedley, Andrew a écrit :
>>>>>>> Hi,
>>>>>>>
>>>>>>> I ran into a problem when running OMPI v1.8.1 -- a divide by zero
>>>>>>> crash
>>>>>> deep in the hwloc code called by OMPI. The system I'm running is a
>>>>>> simics
>>>>>> x86_64 emulator and RHEL 6.3. I can reproduce the error running
>>>>>> lstopo from hwloc v1.9:
>>>>>>> [root_at_viper0 bin]# LD_LIBRARY_PATH=/root/hwloc/lib ./lstopo -v
>>>>>>> Floating point exception (core dumped)
>>>>>>>
>>>>>>>
>>>>>>> Hwloc v1.1rc6, already installed on the system, and a
>>>>>>> corresponding OMPI
>>>>>> 1.6.5 build, works with no problems:
>>>>>>> [root_at_viper0 bin]# lstopo --version lstopo 1.1rc6
>>>>>>> [root_at_viper0 bin]# lstopo -v
>>>>>>> Machine (P#0 local=2055580KB total=2055580KB
>>>> DMIProductName=Bochs
>>>>>> DMIProductVersion= DMIProductSerial= DMIChassisVendor=Bochs
>>>>>> DMIChassisType=1 DMIChassisVersion= DMIChassisSerial=
>>>>>> DMIChassisAssetTag= DMIBIOSVendor=Bochs DMIBIOSVersion=Bochs
>>>>>> DMIBIOSDate=01/01/2007 DMIS)
>>>>>>> Socket L#0 (P#0)
>>>>>>> L3Cache L#0 (8192KB line=64)
>>>>>>> L2Cache L#0 (256KB line=64)
>>>>>>> L1Cache L#0 (32KB line=64)
>>>>>>> Core L#0 (P#0)
>>>>>>> PU L#0 (P#0)
>>>>>>> depth 0: 1 Machine (type #1)
>>>>>>> depth 1: 1 Socket (type #3)
>>>>>>> depth 2: 1 Cache (type #4)
>>>>>>> depth 3: 1 Cache (type #4)
>>>>>>> depth 4: 1 Cache (type #4)
>>>>>>> depth 5: 1 Core (type #5)
>>>>>>> depth 6: 1 PU (type #6)
>>>>>>>
>>>>>>>
>>>>>>> Here's the output from a GDB session on hwloc v1.9:
>>>>>>>
>>>>>>> [root_at_viper0 bin]# LD_LIBRARY_PATH=/root/hwloc/lib gdb ./lstopo
>>>> GNU
>>>>>>> gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6) Copyright (C) 2010
>>>>>>> Free Software Foundation, Inc.
>>>>>>> License GPLv3+: GNU GPL version 3 or later
>>>>>>> <http://gnu.org/licenses/gpl.html>
>>>>>>> This is free software: you are free to change and redistribute it.
>>>>>>> There is NO WARRANTY, to the extent permitted by law. Type "show
>>>>>> copying"
>>>>>>> and "show warranty" for details.
>>>>>>> This GDB was configured as "x86_64-redhat-linux-gnu".
>>>>>>> For bug reporting instructions, please see:
>>>>>>> <http://www.gnu.org/software/gdb/bugs/>...
>>>>>>> Reading symbols from /root/hwloc/bin/lstopo...done.
>>>>>>> (gdb) r -v
>>>>>>> Starting program: /root/hwloc/bin/lstopo -v
>>>>>>> warning: no loadable sections found in added symbol-file
>>>>>>> system-supplied DSO at 0x7ffff7ffd000
>>>>>>>
>>>>>>> Program received signal SIGFPE, Arithmetic exception.
>>>>>>> 0x00007ffff7df0558 in look_proc (infos=0x61b6a0, highest_cpuid=11,
>>>>>> highest_ext_cpuid=<value optimized out>, features=<value optimized
>>>>>> out>,
>>>>>> cpuid_type=intel)
>>>>>>> at topology-x86.c:323
>>>>>>> 323 infos->threadid = infos->logprocid % infos->max_nbthreads;
>>>>>>> Missing separate debuginfos, use: debuginfo-install
>>>>>>> glibc-2.12-1.80.el6.x86_64
>>>>>>> (gdb) bt
>>>>>>> #0 0x00007ffff7df0558 in look_proc (infos=0x61b6a0,
>>>>>>> highest_cpuid=11,
>>>>>> highest_ext_cpuid=<value optimized out>, features=<value optimized
>>>>>> out>,
>>>>>>> cpuid_type=intel) at topology-x86.c:323
>>>>>>> #1 0x00007ffff7df165a in look_procs (topology=0x619100,
>>>>>>> nbprocs=1,
>>>>>>> fulldiscovery=0) at topology-x86.c:741
>>>>>>> #2 hwloc_look_x86 (topology=0x619100, nbprocs=1, fulldiscovery=0)
>>>>>>> at
>>>>>>> topology-x86.c:886
>>>>>>> #3 0x00007ffff7df17f9 in hwloc_x86_discover (backend=<value
>>>>>>> optimized
>>>>>>> out>) at topology-x86.c:934
>>>>>>> #4 0x00007ffff7dd6568 in hwloc_discover (topology=0x619100) at
>>>>>>> topology.c:2452
>>>>>>> #5 hwloc_topology_load (topology=0x619100) at topology.c:2925
>>>>>>> #6 0x0000000000402cf0 in main (argc=<value optimized out>,
>>>>>>> argv=<value optimized out>) at lstopo.c:581
>>>>>>> (gdb) print infos->logprocid
>>>>>>> $1 = 0
>>>>>>> (gdb) print infos->max_nbthreads
>>>>>>> $2 = 0
>>>>>>>
>>>>>>>
>>>>>>> Any ideas? Any other info I should provide?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Andrew
>>>>>>> _______________________________________________
>>>>>>> hwloc-users mailing list
>>>>>>> hwloc-users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>>>>> _______________________________________________
>>>>>> hwloc-users mailing list
>>>>>> hwloc-users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users