Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] hwloc-1.4 assertion failures on Linux/POWER7
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-02-01 09:13:15


Can you run hwloc-gather-topology and send the resulting tarball and
output ?
We've seen some powerpc machines where the old kernel didn't say much
about the topology, so your 8 cores with 4 threads appeared as 32 things
without much details about their organization. I assume you can't
upgrade the kernel. Which kernel is this?
Yes the virtual node thing could also make things more wrong. What kind
of "virtualization" is this?
Thanks
Brice

Le 01/02/2012 04:29, Paul H. Hargrove a écrit :
> This node is an IBM "Power 750 Express server", described in detail at
> http://www.redbooks.ibm.com/redpapers/pdfs/redp4638.pdf
>
> Notably it is a quad-socket chassis which can take 6-core or 8-core
> processors.
> However, lstopo is reporting 8 sockets of 4-cores each.
> This discrepancy lead me to recall the following from an email sent to
> me by a colleague:
>> A surprise
>> to me is that the login nodes provide the appearance of having 32
>> cpu's, but those are in fact only 8 cores with 4 hyper-threads,
>> and they are in fact VM's on top of one socket of a compute node.
>
> So, I am not really certain what I should expect lstopo to report.
> I suppose it is accurately reporting to me the virtual node's
> configuration.
>
> I bring this up because it may very well be related to the assertion
> failures.
> My guess here being that some part of hwloc has seen past the
> "virtual" to see the "physical" and the assertion failure reflects the
> resulting inconsistency. But that is just a guess. Let me know how I
> might help debug this failure.
>
> -Paul
>
> On 1/31/2012 7:12 PM, Paul H. Hargrove wrote:
>> The problem I reported below also exists in hwloc-1.4.1.
>> Additionally, I can reproduce the SEGVs with xlc which Chris Samuel
>> reported in
>> http://www.open-mpi.org/community/lists/hwloc-devel/2012/01/2738.php
>>
>> -Paul
>>
>> On 1/31/2012 5:56 PM, Paul H. Hargrove wrote:
>>> When running "make check" in hwloc-1.3.1 on a Linux/POWER7 system I
>>> see:
>>>> lt-linux-libnuma:
>>>> /users/phh1/OMPI/hwloc-1.3.1-linux-ppc64-gcc//hwloc-1.3.1/tests/linux-libnuma.c:53:
>>>> main: Assertion `hwloc_bitmap_isequal(set, set2)' failed.
>>>> /bin/sh: line 5: 21415 Aborted ${dir}$tst
>>>> FAIL: linux-libnuma
>>>
>>> I've reproduced that failure with 4 different compilers (3 gcc's and
>>> an xlc).
>>> The xlc-built hwloc-1.3.1 also fails an additional test:
>>>> lt-glibc-sched:
>>>> /users/phh1/OMPI/hwloc-1.3.1-linux-ppc64-xlc-11.1//hwloc-1.3.1/tests/glibc-sched.c:43:
>>>> main: Assertion `!err' failed.
>>>> /bin/sh: line 5: 7077 Aborted ${dir}$tst
>>>> FAIL: glibc-sched
>>>
>>>
>>> The contents of /proc/cpuinfo are:
>>>> processor : 0
>>>> cpu : POWER7 (architected), altivec supported
>>>> clock : 3550.000000MHz
>>>> revision : 2.0 (pvr 003f 0200)
>>>>
>>>> [30 more of the same]
>>>>
>>>> processor : 31
>>>> cpu : POWER7 (architected), altivec supported
>>>> clock : 3550.000000MHz
>>>> revision : 2.0 (pvr 003f 0200)
>>>>
>>>> timebase : 512000000
>>>> platform : pSeries
>>>> model : IBM,8233-E8B
>>>> machine : CHRP IBM,8233-E8B
>>>
>>> Let me know of any other h/w or s/w info I can report.
>>>
>>> -Paul
>>>
>>
>