Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] xml file load incompatibilities
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2013-09-20 19:06:15


Try adding HWLOC_DEBUG_CHECK=1 in your environment, it will enable many
assertions at the end of hwloc_topology_load()

Brice

Le 21/09/2013 01:03, Ralph Castain a écrit :
> I didn't try loading it with lstopo - just tried the OMPI trunk. It
> loads okay, but segfaults when you try to find an object by depth
>
> #0 0x00000001005fe5dc in opal_hwloc172_hwloc_get_obj_by_depth
> (topology=Cannot access memory at address 0xfffffffffffffff7
> ) at traversal.c:623
> #1 0x0000000100b6dfaa in opal_hwloc172_hwloc_get_root_obj
> (topology=Cannot access memory at address 0xfffffffffffffff7
> ) at rmaps_rr_mappers.c:747
> #2 0x0000000100b6e139 in orte_rmaps_rr_byslot (jdata=Cannot access
> memory at address 0xffffffffffffff77
> ) at rmaps_rr_mappers.c:774
> #3 0x0000000100b6d6da in orte_rmaps_rr_map (jdata=Cannot access
> memory at address 0xffffffffffffff17
> ) at rmaps_rr.c:211
> #4 0x0000000100353098 in orte_rmaps_base_map_job (fd=Cannot access
> memory at address 0xfffffffffffffe7b
> ) at base/rmaps_base_map_job.c:320
> #5 0x00000001005ce28c in event_process_active_single_queue
> (base=Cannot access memory at address 0xffffffffffffffe7
> ) at event.c:1367
> #6 0x00000001005ce500 in event_process_active (base=Cannot access
> memory at address 0xffffffffffffffe7
> ) at event.c:1437
> #7 0x00000001005ceb71 in opal_libevent2021_event_base_loop
> (base=Cannot access memory at address 0xffffffffffffffb7
> ) at event.c:1645
> #8 0x00000001002c5158 in orterun (argc=Cannot access memory at
> address 0xfffffffffffffd1b
> ) at orterun.c:3039
> #9 0x00000001002c32a4 in main (argc=Cannot access memory at address
> 0xfffffffffffffffb
> ) at main.c:14
>
> Looks to me like memory may be getting hosed
>
>
> On Sep 20, 2013, at 2:59 PM, Brice Goglin <Brice.Goglin_at_[hidden]
> <mailto:Brice.Goglin_at_[hidden]>> wrote:
>
>> I can't see any segfault. Where does the segfault occurs for you? In
>> OMPI only (or lstopo too)? When loading or when using the topology?
>>
>> I tried lstopo on that file with and without HWLOC_NO_LIBXML_IMPORT=1
>> (in case the bug is in one of XML backends), looks ok.
>>
>> Brice
>>
>>
>>
>>
>>
>> Le 20/09/2013 23:53, Ralph Castain a écrit :
>>> Here are the two files I tried - not from the same machine. The foo.xml works, the topo.xml segfaults
>>>
>>>
>>>
>>>
>>> One of our users reported it from their machine, but I don't have their topo file.
>>>
>>> On Sep 20, 2013, at 2:41 PM, Brice Goglin <Brice.Goglin_at_[hidden]> wrote:
>>>
>>>> Hello,
>>>> I don't see anything reason for such an incompatibility. But there are
>>>> many combinations, we can't test everything.
>>>> I can't reproduce that on my machines. Can you send the XML output of
>>>> both versions on one of your machines?
>>>> Brice
>>>>
>>>>
>>>>
>>>> Le 20/09/2013 23:32, Ralph Castain a écrit :
>>>>> Hi folks
>>>>>
>>>>> I've run across a rather strange behavior. We have two branches in OMPI - the devel trunk (using hwloc v1.7.2) and our feature release series (using hwloc 1.5.2). I have found the following:
>>>>>
>>>>> *the feature series can correctly load an xml file generated by lstopo of versions 1.5 or greater
>>>>>
>>>>> * the devel series can correctly load an xml file generated by lstopo of versions 1.7 or greater, but not files generated by prior versions. In the latter case, I segfault as soon as I try to use the loaded topology.
>>>>>
>>>>> Any ideas why the discrepancy? Can I at least detect the version used to create a file when loading it so I can error out instead of segfaulting?
>>>>>
>>>>> Ralph
>>>>>
>>>>> _______________________________________________
>>>>> hwloc-devel mailing list
>>>>> hwloc-devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>>>> _______________________________________________
>>>> hwloc-devel mailing list
>>>> hwloc-devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>>>
>>>
>>> _______________________________________________
>>> hwloc-devel mailing list
>>> hwloc-devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>>
>> _______________________________________________
>> hwloc-devel mailing list
>> hwloc-devel_at_[hidden] <mailto:hwloc-devel_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>
>
>
> _______________________________________________
> hwloc-devel mailing list
> hwloc-devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel