Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] [OMPI users] SIGSEGV in opal_hwlock152_hwlock_bitmap_or.A // Bug in 'hwlock" ?
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2013-11-01 10:37:15


Sorry, I missed the mail on OMPI-users.

This hwloc looks veeeeeeeeeeeery old. We don't have Misc objects instead
of Groups since we switched from 0.9 to 1.0. You should regenerate the
XML file with a hwloc version that came out after the big bang (or
better, after the asteroid killed the dinosaurs). Please resend that XML
from a recent hwloc so that we can get a better clue of the problem.

Assuming there's a bug in OMPI's hwloc, I would suggests downloading
hwloc 1.5.3 and running make check on that machine. And try again with
hwloc 1.7.2 in case that's already fixed.

thanks
Brice

Le 01/11/2013 15:24, Jeff Squyres (jsquyres) a écrit :
> Paul Kapinos originally reported this issue on the OMPI users list.
>
> He is showing a stack trace from OMPI-1.7.3, which uses hwloc 1.5.2
> (note that OMPI 1.7.4 will use hwloc 1.7.2).
>
> I tried to read the xml file he provided with the git hwloc master
> HEAD, and it fails:
>
> -----
> ??? ./utils/lstopo -i lstopo_linuxitvc00.xml
> ignoring depth attribute for object type without depth
> ignoring depth attribute for object type without depth
> XML component discovery failed.
> hwloc_topology_load() failed (Invalid argument).
> -----
>
> Any idea what's happening here?
>
> BTW, I can apply the fix to both the OMPI SVN trunk and v1.7 branch
> (since OMPI v1.7 is now up to hwloc 1.7.2).
>
>
>
> On Oct 31, 2013, at 1:28 PM, Paul Kapinos <kapinos_at_[hidden]>
> wrote:
>
> > Hello all,
> >
> > using 1.7.x (1.7.2 and 1.7.3 tested), we get SIGSEGV from somewhere
> in-deepth of 'hwlock' library - see the attached screenshot.
> >
> > Because the error is strongly aligned to just one single node, which
> in turn is kinda special one (see output of 'lstopo -'), it smells
> like an error in the 'hwlock' library.
> >
> > Is there a way to disable hwlock or to debug it in somehow way?
> > (besides to build a debug version of hwlock and OpenMPI)
> >
> > Best
> >
> > Paul
> >
> >
> >
> >
> >
> >
> >
> > --
> > Dipl.-Inform. Paul Kapinos - High Performance Computing,
> > RWTH Aachen University, Center for Computing and Communication
> > Seffenter Weg 23, D 52074 Aachen (Germany)
> > Tel: +49 241/80-24915
> >
> <lstopo_linuxitvc00.txt><opal_hwlock_SIGSEGV.png><lstopo_linuxitvc00.xml>_______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> <http://www.cisco.com/web/about/doing_business/legal/cri/>
>
>
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users