Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] hwloc-1.3.1 assertion failures on Linux/POWER7
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-02-01 18:15:49


Le 02/02/2012 00:12, Paul H. Hargrove a écrit :
>
>
> On 2/1/2012 5:20 AM, Brice Goglin wrote:
>> Le 01/02/2012 03:49, Christopher Samuel a écrit :
>>> With XLC and 1.3.1 and 1.4 I get plenty of warnings (compile logs for
>>> both attached) whilst compiling and then 4 failures in make check
>>> (accompanied with segmentation faults):
>>>
>>> samuel_at_tambo:~/HWLOC/hwloc-1.3.1> grep -B1 FAIL: log
>>> /bin/sh: line 1: 5267 Segmentation fault ${dir}$tst
>>> FAIL: hwloc_bind
>>> /bin/sh: line 1: 5285 Segmentation fault ${dir}$tst
>>> FAIL: hwloc_get_last_cpu_location
>>> /bin/sh: line 1: 5335 Segmentation fault ${dir}$tst
>>> FAIL: hwloc_is_thissystem
>>> /bin/sh: line 1: 5481 Segmentation fault ${dir}$tst
>>> FAIL: glibc-sched
>> All these tests involved binding, which is likely broken (see below).
>>
>>
>> "/vlsci/VLSCI/samuel/HWLOC/hwloc-1.3.1/include/hwloc.h", line 1203.28:
>> 1506-1385 (W) The attribute "pure" is not a valid type attribute.
>> CC traversal.lo
>>
>> Attribute pure is before the function name, I'll move it after, XLC
>> doesn't seems to warn in this case.
>>
>>
>> "distances.c", line 62.42: 1506-404 (W) restrict can only qualify a
>> pointer type.
>> "distances.c", line 84.50: 1506-404 (W) restrict can only qualify a
>> pointer type.
>> "distances.c", line 226.40: 1506-404 (W) restrict can only qualify a
>> pointer type.
>>
>> XLC may be wrong here, topology_t is typedef'ed to a pointer...
>
>
> I've seen this sort of thing before where "pointerness" was ignored
> when "inside" the typedef.
> Since this is only a warning, and a missing "restrict" should not
> impact correctness, I vote to ignore this.
>
>
>>
>>
>> "topology-linux.c", line 303.33: 1506-280 (W) Function argument
>> assignment between types "unsigned int" and "struct {...}*" is not
>> allowed.
>> "topology-linux.c", line 303.27: 1506-098 (E) Missing argument(s).
>> "topology-linux.c", line 391.32: 1506-280 (W) Function argument
>> assignment between types "unsigned int" and "struct {...}*" is not
>> allowed.
>> "topology-linux.c", line 391.26: 1506-098 (E) Missing argument(s).
>> "topology-linux.c", line 715.40: 1506-280 (W) Function argument
>> assignment between types "unsigned int" and "struct {...}*" is not
>> allowed.
>> "topology-linux.c", line 715.34: 1506-098 (E) Missing argument(s).
>> "topology-linux.c", line 807.40: 1506-280 (W) Function argument
>> assignment between types "unsigned int" and "struct {...}*" is not
>> allowed.
>> "topology-linux.c", line 807.34: 1506-098 (E) Missing argument(s).
>>
>> This looks very bad. It means something screwed the already very complex
>> sched_setaffinity detection code.
>> Does XLC redefine its own sched_setaffinity functions? Can you find the
>> relevant header file and send it?
>> PGI had similar problems at some point. That's very annoying.
>> This explains why binding tests broke.
>
> I cannot find any instances within the /opt/apps/ibm tree on this
> machine:
>> $ find /opt/apps/ibm -name \*.h|xargs grep affi
>> find: `/opt/apps/ibm/vac/11.1/lap/license': Permission denied
>> find: `/opt/apps/ibm/essl/5.1/lap/license': Permission denied
>> find: `/opt/apps/ibm/xlf/13.1/lap/license': Permission denied
>> /opt/apps/ibm/xlsmp/2.1/include/omp.h: ibm_sched_affinity= 1000/*
>> AFFINITY scheduling type. This is an IBM extension. */
>> $ find /opt/apps/ibm -name \*.h|xargs grep cpu_set_t
>> find: `/opt/apps/ibm/vac/11.1/lap/license': Permission denied
>> find: `/opt/apps/ibm/essl/5.1/lap/license': Permission denied
>> find: `/opt/apps/ibm/xlf/13.1/lap/license': Permission denied
>
>
> The generated config.h contains:
>> #define HWLOC_HAVE_OLD_SCHED_SETAFFINITY 1
>> #define HWLOC_HAVE_SCHED_SETAFFINITY 1
>
> The "OLD" sched_setaffinity is the 2-argument version, but
> /usr/include/sched.h contains the 3-argument version:
>> extern int sched_setaffinity (__pid_t __pid, size_t __cpusetsize,
>> __const cpu_set_t *__cpuset) __THROW;
>
> So, it would appear that configure has wrongly set
> "HWLOC_HAVE_OLD_SCHED_SETAFFINITY".
>
> Examining config.log I find
>> configure:9046: checking for old prototype of sched_setaffinity
>> configure:9064: xlc -c conftest.c >&5
>> "conftest.c", line 82.19: 1506-236 (W) Macro name _GNU_SOURCE has
>> been redefined.
>> "conftest.c", line 82.19: 1506-358 (I) "_GNU_SOURCE" is defined on
>> line 25 of conftest.c.
>> "conftest.c", line 89.23: 1506-280 (W) Function argument assignment
>> between types "unsigned long" and "void*" is not allowed.
>> "conftest.c", line 89.19: 1506-098 (E) Missing argument(s).
>> configure:9064: $? = 0
>> configure:9068: result: yes
>
> This is WRONG.
> The compiler has reported an error: "(E) Missing argument(s)" and yet
> exited with $? = 0
>
> I am looking at xlc docs to see if there is some compiler flag to be set.

Thanks for the debugging, this makes my last mail to Christopher useless
then :)

Brice