Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] hwloc on PPC64
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2010-07-12 01:31:43


Le 12/07/2010 00:08, Jirka Hladky a écrit :
> $./lstopo --xml /tmp/2010-Jul-10_22h14m_results/2.6.32-44.el6.ppc64_OS-
> indexing.xml a.txt
> Segmentation fault (core dumped)
>

This was a crash in a drawing code (in the merge() function) that Samuel
fixed in trunk r2234 (and backported in 1.0.x). The log doesn't speak
about a crash, looks like we were lucky... gdb log says:

Program received signal SIGSEGV, Segmentation fault.
0x00000000004059a9 in merge (disp=0x61e0c0, x=858993504, y=4, or=6, andnot=0, r=239, g=223, b=222) at ../../trunk/utils/lstopo-text.c:490
490 character current = disp->cells[y][x].c;
(gdb) where
#0 0x00000000004059a9 in merge (disp=0x61e0c0, x=858993504, y=4, or=6, andnot=0, r=239, g=223, b=222) at ../../trunk/utils/lstopo-text.c:490
#1 0x0000000000405b2e in text_box (output=0x61e0c0, r=239, g=223, b=222, depth=98, x1=48, width=858993457, y1=4, height=3) at ../../trunk/utils/lstopo-text.c:511
#2 0x000000000040ba24 in node_draw (topology=0x616010, methods=0x615620, logical=1, level=0x61ace0, output=0x61e0c0, depth=99, x=230, retwidth=0x7fffffffe1f4, y=30,
    retheight=0x7fffffffe1f0) at ../../trunk/utils/lstopo-draw.c:493
#3 0x000000000040f0c4 in system_draw (topology=0x616010, methods=0x615620, logical=1, level=0x617000, output=0x61e0c0, depth=100, x=0, retwidth=0x7fffffffe43c, y=0,
    retheight=0x7fffffffe438) at ../../trunk/utils/lstopo-draw.c:594
#4 0x0000000000411117 in fig (topology=0x616010, methods=0x615620, logical=1, level=0x617000, output=0x61e0c0, depth=100, x=0, y=0) at ../../trunk/utils/lstopo-draw.c:661
#5 0x000000000041150d in output_draw (methods=0x615620, logical=1, topology=0x616010, output=0x61e0c0) at ../../trunk/utils/lstopo-draw.c:756
#6 0x0000000000406299 in output_text (topology=0x616010, filename=0x7fffffffea12 "-.txt", logical=1, verbose_mode=1) at ../../trunk/utils/lstopo-text.c:662
#7 0x0000000000403f13 in main (argc=1, argv=0x7fffffffe6c0) at ../../trunk/utils/lstopo.c:393
(gdb) list
485
486 /* output bars, merging with existing bars: `andnot' are removed, `or' are added */
487 static void
488 merge(struct display *disp, int x, int y, int or, int andnot, int r, int g, int b)
489 {
490 character current = disp->cells[y][x].c;
491 int directions = (to_directions(disp, current) & ~andnot) | or;
492 put(disp, x, y, from_directions(disp, directions), -1, -1, -1, r, g, b);
493 }
494

> Please notice that hwloc-distrib is
> also not working correctly - check CPU_AFFINITY/0008.log for example.
>

The problem is that one of the NUMA nodes has an empty cpuset (it could
be a BIOS bug by the way). hwloc-distrib should probably ignore such
object and not distribute among them.

Brice