Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] Fwd: BGQ empty topology with MPI
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-03-25 02:43:42

Le 24/03/2012 23:04, Daniel Ibanez a écrit :
> The fundamental difference is in
> src/topology-linux.c:3251
> when this if statement is true, hwloc_setup_pu_level
> finds the PU objects.
> When it is false, it fails with empty topology.
> and it is not detected even when I set it from the front end.
> That means the difference is whether hwloc can access
> the various /sys/devices and /sys/bus files.
> Additional printfs confirm that with MPI in the code,
> hwloc_accessat succeeds on the various /sys/ directories,
> but the overall procedure for getting PUs from these fails.
> Without MPI, access to /sys/ directories fails but
> the fallback hwloc_setup_pu_level works.

If I understand correctly, in the MPI case, look_sysfscpu() ends up
being called. There are two instances of it because of a possible
renaming of /sys/devices/system/cpu in the future, so it's expected that
the one succeeds and the other fails. Can you check whether both fail ?
Or just try the attached patch which adds a fallback for this case.

But it'd be good to understand what's going on in /sys on this machine.
And I still don't understand why MPI changes things here.


--- src/topology-linux.c (révision 4420)
+++ src/topology-linux.c (copie de travail)
@@ -3270,7 +3270,15 @@
       if (numprocs <= 0)
         Lprocs = NULL;
       if (look_sysfscpu(topology, "/sys/bus/cpu/devices", Lprocs, numprocs) < 0)
- look_sysfscpu(topology, "/sys/devices/system/cpu", Lprocs, numprocs);
+ if (look_sysfscpu(topology, "/sys/devices/system/cpu", Lprocs, numprocs) < 0) {
+ /* sysfs but we failed to read cpu topology, fallback */
+ if (topology->is_thissystem)
+ hwloc_setup_pu_level(topology, hwloc_fallback_nbprocessors(topology));
+ else
+ /* fsys-root but not this system, no way, assume there's just 1
+ * processor :/ */
+ hwloc_setup_pu_level(topology, 1);
+ }
       if (Lprocs)
         hwloc_linux_free_cpuinfo(Lprocs, numprocs);