Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] hwloc-1.3.1 assertion failures on Linux/POWER7
From: Paul H. Hargrove (PHHargrove_at_[hidden])
Date: 2012-02-01 18:12:42


On 2/1/2012 5:20 AM, Brice Goglin wrote:
> Le 01/02/2012 03:49, Christopher Samuel a écrit :
>> With XLC and 1.3.1 and 1.4 I get plenty of warnings (compile logs for
>> both attached) whilst compiling and then 4 failures in make check
>> (accompanied with segmentation faults):
>>
>> samuel_at_tambo:~/HWLOC/hwloc-1.3.1> grep -B1 FAIL: log
>> /bin/sh: line 1: 5267 Segmentation fault ${dir}$tst
>> FAIL: hwloc_bind
>> /bin/sh: line 1: 5285 Segmentation fault ${dir}$tst
>> FAIL: hwloc_get_last_cpu_location
>> /bin/sh: line 1: 5335 Segmentation fault ${dir}$tst
>> FAIL: hwloc_is_thissystem
>> /bin/sh: line 1: 5481 Segmentation fault ${dir}$tst
>> FAIL: glibc-sched
> All these tests involved binding, which is likely broken (see below).
>
>
> "/vlsci/VLSCI/samuel/HWLOC/hwloc-1.3.1/include/hwloc.h", line 1203.28:
> 1506-1385 (W) The attribute "pure" is not a valid type attribute.
> CC traversal.lo
>
> Attribute pure is before the function name, I'll move it after, XLC
> doesn't seems to warn in this case.
>
>
> "distances.c", line 62.42: 1506-404 (W) restrict can only qualify a
> pointer type.
> "distances.c", line 84.50: 1506-404 (W) restrict can only qualify a
> pointer type.
> "distances.c", line 226.40: 1506-404 (W) restrict can only qualify a
> pointer type.
>
> XLC may be wrong here, topology_t is typedef'ed to a pointer...

I've seen this sort of thing before where "pointerness" was ignored when
"inside" the typedef.
Since this is only a warning, and a missing "restrict" should not impact
correctness, I vote to ignore this.

>
>
> "topology-linux.c", line 303.33: 1506-280 (W) Function argument
> assignment between types "unsigned int" and "struct {...}*" is not allowed.
> "topology-linux.c", line 303.27: 1506-098 (E) Missing argument(s).
> "topology-linux.c", line 391.32: 1506-280 (W) Function argument
> assignment between types "unsigned int" and "struct {...}*" is not allowed.
> "topology-linux.c", line 391.26: 1506-098 (E) Missing argument(s).
> "topology-linux.c", line 715.40: 1506-280 (W) Function argument
> assignment between types "unsigned int" and "struct {...}*" is not allowed.
> "topology-linux.c", line 715.34: 1506-098 (E) Missing argument(s).
> "topology-linux.c", line 807.40: 1506-280 (W) Function argument
> assignment between types "unsigned int" and "struct {...}*" is not allowed.
> "topology-linux.c", line 807.34: 1506-098 (E) Missing argument(s).
>
> This looks very bad. It means something screwed the already very complex
> sched_setaffinity detection code.
> Does XLC redefine its own sched_setaffinity functions? Can you find the
> relevant header file and send it?
> PGI had similar problems at some point. That's very annoying.
> This explains why binding tests broke.

I cannot find any instances within the /opt/apps/ibm tree on this machine:
> $ find /opt/apps/ibm -name \*.h|xargs grep affi
> find: `/opt/apps/ibm/vac/11.1/lap/license': Permission denied
> find: `/opt/apps/ibm/essl/5.1/lap/license': Permission denied
> find: `/opt/apps/ibm/xlf/13.1/lap/license': Permission denied
> /opt/apps/ibm/xlsmp/2.1/include/omp.h: ibm_sched_affinity= 1000/*
> AFFINITY scheduling type. This is an IBM extension. */
> $ find /opt/apps/ibm -name \*.h|xargs grep cpu_set_t
> find: `/opt/apps/ibm/vac/11.1/lap/license': Permission denied
> find: `/opt/apps/ibm/essl/5.1/lap/license': Permission denied
> find: `/opt/apps/ibm/xlf/13.1/lap/license': Permission denied

The generated config.h contains:
> #define HWLOC_HAVE_OLD_SCHED_SETAFFINITY 1
> #define HWLOC_HAVE_SCHED_SETAFFINITY 1

The "OLD" sched_setaffinity is the 2-argument version, but
/usr/include/sched.h contains the 3-argument version:
> extern int sched_setaffinity (__pid_t __pid, size_t __cpusetsize,
> __const cpu_set_t *__cpuset) __THROW;

So, it would appear that configure has wrongly set
"HWLOC_HAVE_OLD_SCHED_SETAFFINITY".

Examining config.log I find
> configure:9046: checking for old prototype of sched_setaffinity
> configure:9064: xlc -c conftest.c >&5
> "conftest.c", line 82.19: 1506-236 (W) Macro name _GNU_SOURCE has been
> redefined.
> "conftest.c", line 82.19: 1506-358 (I) "_GNU_SOURCE" is defined on
> line 25 of conftest.c.
> "conftest.c", line 89.23: 1506-280 (W) Function argument assignment
> between types "unsigned long" and "void*" is not allowed.
> "conftest.c", line 89.19: 1506-098 (E) Missing argument(s).
> configure:9064: $? = 0
> configure:9068: result: yes

This is WRONG.
The compiler has reported an error: "(E) Missing argument(s)" and yet
exited with $? = 0

I am looking at xlc docs to see if there is some compiler flag to be set.

-Paul

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900