Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] Bug report: hwloc topology broken when restricted to cpusets
From: Bernd Kallies (kallies_at_[hidden])
Date: 2010-07-13 05:56:36


Thanks for the quick reply.

I expect that one can safely use the tree-traversing functions of the
hwloc API with a topology, that is returned by hwloc_topology_load. When
they crash, then the topology is broken. This should not happen.

One has to walk through the topology tree e.g. when trying to figure out
a good guess for pinning maps based on some distance approach. When
doing this it becomes important for performance to work with a reduced
topology that only contains levels that are needed to calculate
processor distances. Thatswhy I tend to use the
hwloc_topology_ignore_all_keep_structure approach.

Sincerely BK

On Tue, 2010-07-13 at 11:46 +0200, Brice Goglin wrote:
> Le 13/07/2010 11:22, Bernd Kallies a écrit :
> >> /bin/echo 0-4 > /dev/cpuset/mycpuset/cpus
> >> /bin/echo 0-1 > /dev/cpuset/mycpuset/mems
> >> /bin/echo $$ > /dev/cpuset/mycpuset/tasks
> >> /sw/local/packages/hwloc-1.0.1/bin/lstopo
> >>
> > Machine (142GB)
> > NUMANode #0 (phys=0 71GB) + Socket #0 + L3 #0 (8192KB)
> > L2 #0 (256KB) + L1 #0 (32KB) + Core #0 + PU #0 (phys=0)
> > L2 #1 (256KB) + L1 #1 (32KB) + Core #1 + PU #1 (phys=1)
> > L2 #2 (256KB) + L1 #2 (32KB) + Core #2 + PU #2 (phys=2)
> > L2 #3 (256KB) + L1 #3 (32KB) + Core #3 + PU #3 (phys=3)
> > NUMANode #1 (phys=1 71GB) + Socket #1 + L3 #1 (8192KB) + L2 #4 (256KB)
> > + L1 #4 (32KB) + Core #4 + PU #4 (phys=4)
> >
> >> /sw/local/packages/hwloc-1.0.1/bin/lstopo --merge
> >>
> > Machine
> > L3 #0 (8192KB)
> > PU #0 (phys=0)
> > PU #1 (phys=1)
> > PU #2 (phys=2)
> > PU #3 (phys=3)
> > PU #4 (phys=4)
> >
>
> This looks good to me. When --merge is given, we only keep the most
> important objects to simplify the output. PU is considered the most
> important object type, since that's where you bind processes in the end.
> That's why
>
> NUMANode #1 (phys=1 71GB) + Socket #1 + L3 #1 (8192KB) + L2 #4 (256KB) + L1 #4 (32KB) + Core #4 + PU #4 (phys=4)
>
> is replaced by
>
> PU #4 (phys=4)
>
>
> What would like instead?
>
> If you don't want to loose any information, just don't use --merge.
>
>
> > #include <hwloc.h>
> > int main(void) {
> > int npu, i, j;
> > hwloc_topology_t topology;
> > hwloc_obj_t *pu, parent;
> >
> > /* Allocate and initialize topology object. */
> > hwloc_topology_init(&topology);
> > /* Perform the topology detection. */
> > hwloc_topology_ignore_all_keep_structure(topology);
> > hwloc_topology_load(topology);
> > /* Collect all HWLOC_OBJ_PU */
> > npu = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_PU);
> > pu = (hwloc_obj_t *)malloc(npu * sizeof(hwloc_obj_t *));
> > pu[0] = hwloc_get_next_obj_by_type(topology, HWLOC_OBJ_PU, NULL);
> > hwloc_get_closest_objs(topology, pu[0], &pu[1], npu - 1);
> > /* Determine common parent */
> > for(i = 0; i < npu - 1; i++) {
> > for(j = i + 1; j < npu; j++) {
> > parent = hwloc_get_common_ancestor_obj(topology, pu[i], pu[j]);
> > printf("%2d %2d common parent type %d\n", i, j, parent->type);
> > }
> > }
> > }
> >
> >> gcc -I/sw/local/packages/hwloc-1.0.1/include
> >>
> > -L/sw/local/packages/hwloc-1.0.1/lib
> > -Wl,-rpath,/sw/local/packages/hwloc-1.0.1/lib -lhwloc test.c
> >
> >> ./a.out
> >>
> > 0 1 common parent type 4
> > 0 2 common parent type 4
> > 0 3 common parent type 4
> > Segmentation fault
> >
>
> I'll debug this, thanks.
>
> Brice
>

-- 
Dr. Bernd Kallies
Konrad-Zuse-Zentrum für Informationstechnik Berlin
Takustr. 7
14195 Berlin
Tel: +49-30-84185-270
Fax: +49-30-84185-311
e-mail: kallies_at_[hidden]