Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss
From: Paul H. Hargrove (PHHargrove_at_[hidden])
Date: 2012-02-01 14:46:22


I think that bug report does apply, but the fix they suggest (after
adding the missing "return") does NOT.
I added the following 4 lines to the bottom of hwloc-1.4/src/misc.c:

#if 1 /* XXX: replace '1' with a probe for gccfss */
#include <string.h>
int __ffssi2 (int x) { return ffs(x); }
#endif

And reconfigured/rebuilt hwloc in a fresh directory (just to be safe).
Now "make check" is rid of the undefined symbols, but I get test
FAILures instead:
> PASS: test-hwloc-assembler.sh
> FAIL: test-hwloc-calc.sh
> FAIL: test-hwloc-distances.sh
> FAIL: test-hwloc-distrib.sh
> FAIL: test-hwloc-ls.sh
> ========================================================
> 4 of 5 tests failed
> Please report to http://www.open-mpi.org/projects/hwloc/
> ========================================================

Same result on 1.3.1 except, of course, there are fewer tests:
> FAIL: test-hwloc-calc.sh
> FAIL: test-hwloc-distrib.sh
> ========================================================
> 2 of 2 tests failed
> Please report to http://www.open-mpi.org/community/help/
> ========================================================

I found the failing tests leave core files behind.
Looking at the hwloc-calc failure for instance, dbx tells me:
> t_at_1 (l_at_1) program terminated by signal SEGV (no mapping at the fault
> address)
> 0xff3866a0: __ffssi2 : save %sp, -96, %sp

Without going any further that appears to be a stack overflow ("no
mapping" and the only address register is %sp).
The "where" command in dbx confirms that the "fix" that I applied just
results in infinite recursion.

Editing [BUILD_DIR]/include/private/autogen/config.h to remove all
*HAVE_*FFS definitions did NOT help.
This appears to be because include/private/misc.h looks for __GNUC__
first and doesn't consider HAVE_FFS or HWLOC_HAVE_FFS.
If I also hack out that portion of misc.h THEN I can pass all the tests!

So, in short: when building w/ this compiler, hwloc needs to behave as
if the system lacks ffs().

Making that happen is non-trivial because there are no preprocessor
symbols defined by gccfss that would allow compile-time #if checks to
distinguish gccfss from "vanilla" gcc. The only difference is in the
string value of __VERSION__, which one could check at configure time.

Of course documenting that hwloc doesn't support this broken compiler is
another option.

-Paul

On 2/1/2012 4:22 AM, Brice Goglin wrote:
> Does this bug report apply?
> https://forums.oracle.com/forums/thread.jspa?threadID=1997328
> Brice
>
>
> Le 01/02/2012 03:51, Paul H. Hargrove a écrit :
>> The problem I described below is ALSO present in hwloc-1.4
>> -Paul
>>
>> On 1/31/2012 4:57 PM, Paul H. Hargrove wrote:
>>> This report is the flip-side of the problem w/ Solaris Studio
>>> compilers on x86-64.
>>> With Solaris-10 on SPARC, I find I *can* build hwloc-1.3.1 w/ SS12.x,
>>> but instead am failing w/ gcc.
>>>
>>> Keep in mind that /usr/bin/gcc on this system is one from Sun, not
>>> the FSF:
>>>> -bash-3.00$ which gcc
>>>> /usr/bin/gcc
>>>> -bash-3.00$ gcc --version
>>>> sparc-sun-solaris2.10-gcc (GCC) 4.0.4 (gccfss)
>>>> Copyright (C) 2006 Free Software Foundation, Inc.
>>>> This is free software; see the source for copying conditions. There
>>>> is NO
>>>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>>>> PURPOSE.
>>> The key bit there is "(gccfss)" = "GCC for SPARC Systems"
>>>
>>> The problem is a load-time missing symbol when I "gmake check":
>>>> $ gmake check V=1
>>>> Making check in src
>>>> [...]
>>>> gmake[2]: Entering directory
>>>> `/home/hargrove/OMPI/hwloc-1.3.1-solaris10-sparcT2-gccfss404/BLD/utils'
>>>> ld.so.1: hwloc-calc: fatal: relocation error: file
>>>> /home/hargrove/OMPI/hwloc-1.3.1-solaris10-sparcT2-gccfss404/BLD/src/.libs/libhwloc.so.4:
>>>> symbol __ffssi2: referenced symbol not found
>>>> FAIL: test-hwloc-calc.sh
>>>> ld.so.1: hwloc-distrib: fatal: relocation error: file
>>>> /home/hargrove/OMPI/hwloc-1.3.1-solaris10-sparcT2-gccfss404/BLD/src/.libs/libhwloc.so.4:
>>>> symbol __ffssi2: referenced symbol not found
>>>> FAIL: test-hwloc-distrib.sh
>>>> ========================================================
>>>> 2 of 2 tests failed
>>>> Please report to http://www.open-mpi.org/community/help/
>>>> ========================================================
>>> Again I am sorry I didn't get a chance to discover this in 1.3.1rc2.
>>>
>>> -Paul
>>>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900