Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [hwloc-devel] Fwd: BGQ empty topology with MPI
From: Daniel Ibanez (dan.a.ibanez_at_[hidden])
Date: 2012-03-24 18:04:07


The fundamental difference is in

src/topology-linux.c:3251

when this if statement is true, hwloc_setup_pu_level
finds the PU objects.
When it is false, it fails with empty topology.

I checked HWLOC_LINUX_USE_CPUINFO,
and it is not detected even when I set it from the front end.

That means the difference is whether hwloc can access
the various /sys/devices and /sys/bus files.

Additional printfs confirm that with MPI in the code,
hwloc_accessat succeeds on the various /sys/ directories,
but the overall procedure for getting PUs from these fails.
Without MPI, access to /sys/ directories fails but
the fallback hwloc_setup_pu_level works.

due to the unstable nature of the machine I'm having trouble
submitting more tests to see what goes wrong using the /sys information.

On Thu, Mar 22, 2012 at 6:47 PM, Daniel Ibanez <dan.a.ibanez_at_[hidden]>wrote:

> I've compiled this test, but the machine is on hold for their own testing.
> I should be able to run in two days and report the results.
>
>
> On Thu, Mar 22, 2012 at 6:36 PM, Brice Goglin <Brice.Goglin_at_[hidden]>wrote:
>
>> Le 22/03/2012 23:33, Daniel Ibanez a écrit :
>> > I've run this test before (didnt keep the results but can run it
>> again).
>> > I got debug output and compared it with the output from a hwloc test
>> > executable
>> > and I noticed that my program did not show any PU objects were
>> discovered.
>> > In my program the first discovered topology is just a Machine object,
>> > but in the hwloc program its a Machine object and 64 PU objects.
>> > something went wrong in PU detection...
>>
>> If I am reading your output correctly, all PUs are created by
>> setup_pu_level() depending on the return value of
>> hwloc_fallback_nbprocessors() defined in src/topology.c. Any chance you
>> add some printf there to understand what's going on?
>> hwloc_fallback_nbprocessors() would likely return 64 when things work
>> and 0 otherwise here.
>>
>> Brice
>>
>>
>
>
> --
>
> Dan Ibanez
>

-- 
Dan Ibanez