Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] hwloc-1.4 assertion failures on Linux/POWER7
From: Paul H. Hargrove (PHHargrove_at_[hidden])
Date: 2012-01-31 22:29:14


This node is an IBM "Power 750 Express server", described in detail at
http://www.redbooks.ibm.com/redpapers/pdfs/redp4638.pdf

Notably it is a quad-socket chassis which can take 6-core or 8-core
processors.
However, lstopo is reporting 8 sockets of 4-cores each.
This discrepancy lead me to recall the following from an email sent to
me by a colleague:
> A surprise
> to me is that the login nodes provide the appearance of having 32
> cpu's, but those are in fact only 8 cores with 4 hyper-threads,
> and they are in fact VM's on top of one socket of a compute node.

So, I am not really certain what I should expect lstopo to report.
I suppose it is accurately reporting to me the virtual node's configuration.

I bring this up because it may very well be related to the assertion
failures.
My guess here being that some part of hwloc has seen past the "virtual"
to see the "physical" and the assertion failure reflects the resulting
inconsistency. But that is just a guess. Let me know how I might help
debug this failure.

-Paul

On 1/31/2012 7:12 PM, Paul H. Hargrove wrote:
> The problem I reported below also exists in hwloc-1.4.1.
> Additionally, I can reproduce the SEGVs with xlc which Chris Samuel
> reported in
> http://www.open-mpi.org/community/lists/hwloc-devel/2012/01/2738.php
>
> -Paul
>
> On 1/31/2012 5:56 PM, Paul H. Hargrove wrote:
>> When running "make check" in hwloc-1.3.1 on a Linux/POWER7 system I see:
>>> lt-linux-libnuma:
>>> /users/phh1/OMPI/hwloc-1.3.1-linux-ppc64-gcc//hwloc-1.3.1/tests/linux-libnuma.c:53:
>>> main: Assertion `hwloc_bitmap_isequal(set, set2)' failed.
>>> /bin/sh: line 5: 21415 Aborted ${dir}$tst
>>> FAIL: linux-libnuma
>>
>> I've reproduced that failure with 4 different compilers (3 gcc's and
>> an xlc).
>> The xlc-built hwloc-1.3.1 also fails an additional test:
>>> lt-glibc-sched:
>>> /users/phh1/OMPI/hwloc-1.3.1-linux-ppc64-xlc-11.1//hwloc-1.3.1/tests/glibc-sched.c:43:
>>> main: Assertion `!err' failed.
>>> /bin/sh: line 5: 7077 Aborted ${dir}$tst
>>> FAIL: glibc-sched
>>
>>
>> The contents of /proc/cpuinfo are:
>>> processor : 0
>>> cpu : POWER7 (architected), altivec supported
>>> clock : 3550.000000MHz
>>> revision : 2.0 (pvr 003f 0200)
>>>
>>> [30 more of the same]
>>>
>>> processor : 31
>>> cpu : POWER7 (architected), altivec supported
>>> clock : 3550.000000MHz
>>> revision : 2.0 (pvr 003f 0200)
>>>
>>> timebase : 512000000
>>> platform : pSeries
>>> model : IBM,8233-E8B
>>> machine : CHRP IBM,8233-E8B
>>
>> Let me know of any other h/w or s/w info I can report.
>>
>> -Paul
>>
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900