Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] hwloc on Blue Gene/Q?
From: Erik Schnetter (schnetter_at_[hidden])
Date: 2013-02-10 19:52:23


Brice

I tried using this tarball. Things didn't work. (This particular run used 2
MPI processes with 32 OpenMP threads each.)

In my application, I first output the topology in a tree structure. (I do
this in my application instead of via one of hwloc's tools because I don't
want to call out to shell code.) Then I output thread bindings, then modify
the thread bindings, then output them again.

(1) The topology I find consists of 32 PUs and nothing else. I would have
expected to find two cache levels, 16 cores, and 64 PUs.

(2) When outputting the thread bindings, I received a segfault. The
lightweight core file says this was signal 6 (SIGABRT) in a routine called
".raise".

I'd be happy to help debug this. How?

-erik

On Sat, Feb 9, 2013 at 5:46 PM, Brice Goglin <Brice.Goglin_at_[hidden]> wrote:

> The new "bgq" branch now contains proper topology for BG/Q nodes
> (including cores and caches, except the prefetching cache) as well as
> support for set/get binding of the current thread or of another thread. No
> process-wide binding since I don't know how to iterate over all threads of
> a process.
>
> A tarball is available at:
>
> https://ci.inria.fr/hwloc/job/hwloc-zcustom-tarball/lastSuccessfulBuild/artifact/hwloc-1.7a1r5312.tar.gz
> (this is our new regression testing tool, I hope the tarball won't
> disappear too soon)
>
> I don't expect a lot more features so this branch will likely go into
> trunk very soon. But if you can look at it, that'll be great.
>
>
> Brice
>
>
>
> Le 08/01/2013 18:06, Erik Schnetter a écrit :
>
> I am trying to use hwloc on a Blue Gene/Q. Building and installing worked
> fine, and it reports the system configuration fine as well (i.e. it shows
> all PUs). However, when I try to inquire the thread/core bindings, hwloc
> crashes with an error in libc's free(). This is both with 1.6 and 1.6.1rc1.
>
> The error occurs apparently in CPU_FREE called from
> hwloc_linux_find_kernel_nr_cpus.
>
> Does this ring a bell with anyone? I know this is not enough information
> to debug things, but do you have any pointers for things to look at?
>
> I remember reading somewhere that the last bit in a cpu_set_t cannot be
> used. A Blue Gene/Q has 64 PUs, and may be using 64-bit integers to hold
> cpu_set_t data. Could this be an issue?
>
> My goal is to examine and experiment with thread/core bindings with
> OpenMP to improve performance.
>
> -erik
>
> --
> Erik Schnetter <schnetter_at_[hidden]>
> http://www.perimeterinstitute.ca/personal/eschnetter/
>
>
> _______________________________________________
> hwloc-users mailing listhwloc-users_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>

-- 
Erik Schnetter <schnetter_at_[hidden]>
http://www.perimeterinstitute.ca/personal/eschnetter/