Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] Fwd: BGQ empty topology with MPI
From: Daniel Ibanez (dan.a.ibanez_at_[hidden])
Date: 2012-03-24 18:04:07


The fundamental difference is in

src/topology-linux.c:3251

when this if statement is true, hwloc_setup_pu_level
finds the PU objects.
When it is false, it fails with empty topology.

I checked HWLOC_LINUX_USE_CPUINFO,
and it is not detected even when I set it from the front end.

That means the difference is whether hwloc can access
the various /sys/devices and /sys/bus files.

Additional printfs confirm that with MPI in the code,
hwloc_accessat succeeds on the various /sys/ directories,
but the overall procedure for getting PUs from these fails.
Without MPI, access to /sys/ directories fails but
the fallback hwloc_setup_pu_level works.

due to the unstable nature of the machine I'm having trouble
submitting more tests to see what goes wrong using the /sys information.

On Thu, Mar 22, 2012 at 6:47 PM, Daniel Ibanez <dan.a.ibanez_at_[hidden]>wrote:

> I've compiled this test, but the machine is on hold for their own testing.
> I should be able to run in two days and report the results.
>
>
> On Thu, Mar 22, 2012 at 6:36 PM, Brice Goglin <Brice.Goglin_at_[hidden]>wrote:
>
>> Le 22/03/2012 23:33, Daniel Ibanez a écrit :
>> > I've run this test before (didnt keep the results but can run it
>> again).
>> > I got debug output and compared it with the output from a hwloc test
>> > executable
>> > and I noticed that my program did not show any PU objects were
>> discovered.
>> > In my program the first discovered topology is just a Machine object,
>> > but in the hwloc program its a Machine object and 64 PU objects.
>> > something went wrong in PU detection...
>>
>> If I am reading your output correctly, all PUs are created by
>> setup_pu_level() depending on the return value of
>> hwloc_fallback_nbprocessors() defined in src/topology.c. Any chance you
>> add some printf there to understand what's going on?
>> hwloc_fallback_nbprocessors() would likely return 64 when things work
>> and 0 otherwise here.
>>
>> Brice
>>
>>
>
>
> --
>
> Dan Ibanez
>

-- 
Dan Ibanez