Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] Fwd: BGQ empty topology with MPI
From: Daniel Ibanez (dan.a.ibanez_at_[hidden])
Date: 2012-03-28 14:22:51


The machine is back in working order.
I tried this patch and it works great: I get cpus and my whole program runs
as expected.
I'm now looking into what failed in look_sysfscpu.

On Sun, Mar 25, 2012 at 2:43 AM, Brice Goglin <Brice.Goglin_at_[hidden]> wrote:

> Le 24/03/2012 23:04, Daniel Ibanez a écrit :
> > The fundamental difference is in
> >
> > src/topology-linux.c:3251
> >
> > when this if statement is true, hwloc_setup_pu_level
> > finds the PU objects.
> > When it is false, it fails with empty topology.
> >
> > I checked HWLOC_LINUX_USE_CPUINFO,
> > and it is not detected even when I set it from the front end.
> >
> > That means the difference is whether hwloc can access
> > the various /sys/devices and /sys/bus files.
> >
> > Additional printfs confirm that with MPI in the code,
> > hwloc_accessat succeeds on the various /sys/ directories,
> > but the overall procedure for getting PUs from these fails.
> > Without MPI, access to /sys/ directories fails but
> > the fallback hwloc_setup_pu_level works.
>
> If I understand correctly, in the MPI case, look_sysfscpu() ends up
> being called. There are two instances of it because of a possible
> renaming of /sys/devices/system/cpu in the future, so it's expected that
> the one succeeds and the other fails. Can you check whether both fail ?
> Or just try the attached patch which adds a fallback for this case.
>
> But it'd be good to understand what's going on in /sys on this machine.
> And I still don't understand why MPI changes things here.
>
> Brice
>
> --- src/topology-linux.c (révision 4420)
> +++ src/topology-linux.c (copie de travail)
> @@ -3270,7 +3270,15 @@
> if (numprocs <= 0)
> Lprocs = NULL;
> if (look_sysfscpu(topology, "/sys/bus/cpu/devices", Lprocs,
> numprocs) < 0)
> - look_sysfscpu(topology, "/sys/devices/system/cpu", Lprocs,
> numprocs);
> + if (look_sysfscpu(topology, "/sys/devices/system/cpu", Lprocs,
> numprocs) < 0) {
> + /* sysfs but we failed to read cpu topology, fallback */
> + if (topology->is_thissystem)
> + hwloc_setup_pu_level(topology,
> hwloc_fallback_nbprocessors(topology));
> + else
> + /* fsys-root but not this system, no way, assume there's just
> 1
> + * processor :/ */
> + hwloc_setup_pu_level(topology, 1);
> + }
> if (Lprocs)
> hwloc_linux_free_cpuinfo(Lprocs, numprocs);
> }
>
>
>

-- 
Dan Ibanez