Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [hwloc-devel] Fwd: BGQ empty topology with MPI
From: Daniel Ibanez (dan.a.ibanez_at_[hidden])
Date: 2012-03-28 14:22:51


The machine is back in working order.
I tried this patch and it works great: I get cpus and my whole program runs
as expected.
I'm now looking into what failed in look_sysfscpu.

On Sun, Mar 25, 2012 at 2:43 AM, Brice Goglin <Brice.Goglin_at_[hidden]> wrote:

> Le 24/03/2012 23:04, Daniel Ibanez a écrit :
> > The fundamental difference is in
> >
> > src/topology-linux.c:3251
> >
> > when this if statement is true, hwloc_setup_pu_level
> > finds the PU objects.
> > When it is false, it fails with empty topology.
> >
> > I checked HWLOC_LINUX_USE_CPUINFO,
> > and it is not detected even when I set it from the front end.
> >
> > That means the difference is whether hwloc can access
> > the various /sys/devices and /sys/bus files.
> >
> > Additional printfs confirm that with MPI in the code,
> > hwloc_accessat succeeds on the various /sys/ directories,
> > but the overall procedure for getting PUs from these fails.
> > Without MPI, access to /sys/ directories fails but
> > the fallback hwloc_setup_pu_level works.
>
> If I understand correctly, in the MPI case, look_sysfscpu() ends up
> being called. There are two instances of it because of a possible
> renaming of /sys/devices/system/cpu in the future, so it's expected that
> the one succeeds and the other fails. Can you check whether both fail ?
> Or just try the attached patch which adds a fallback for this case.
>
> But it'd be good to understand what's going on in /sys on this machine.
> And I still don't understand why MPI changes things here.
>
> Brice
>
> --- src/topology-linux.c (révision 4420)
> +++ src/topology-linux.c (copie de travail)
> @@ -3270,7 +3270,15 @@
> if (numprocs <= 0)
> Lprocs = NULL;
> if (look_sysfscpu(topology, "/sys/bus/cpu/devices", Lprocs,
> numprocs) < 0)
> - look_sysfscpu(topology, "/sys/devices/system/cpu", Lprocs,
> numprocs);
> + if (look_sysfscpu(topology, "/sys/devices/system/cpu", Lprocs,
> numprocs) < 0) {
> + /* sysfs but we failed to read cpu topology, fallback */
> + if (topology->is_thissystem)
> + hwloc_setup_pu_level(topology,
> hwloc_fallback_nbprocessors(topology));
> + else
> + /* fsys-root but not this system, no way, assume there's just
> 1
> + * processor :/ */
> + hwloc_setup_pu_level(topology, 1);
> + }
> if (Lprocs)
> hwloc_linux_free_cpuinfo(Lprocs, numprocs);
> }
>
>
>

-- 
Dan Ibanez