Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Init segfault on Ubuntu 8.04 version 1.2.7~rc2
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-07-28 15:18:53


If you are not using iWARP or InfiniBand networking, try configuring
Open MPI --without-memory-manager and see if that solves your
problem. Issues like this can come up, especially in C++ codes, when
the application (or supporting libraries) have their own memory
managers that conflict with Open MPI's. We *only* use an internal
memory manager for optimizing benchmark performance on iWARP and IB
networks, so if you're not using iWARP or IB, and/or your application
doesn't re-use the same buffers repeatedly to MPI_SEND/MPI_RECV, then
you don't need our memory manager.

To be 100% clear: OMPI's internal memory manager is only used for the
"mpi_leave_pinned" behavior. OMPI runs fine without it, but will
definitely see degraded performance in apps that continually re-use
the same buffers for MPI_SEND/MPI_RECV (i.e., benchmarks).

FYI: for these kinds of reasons, we're changing how we do
mpi_leave_pinned in the upcoming v1.3 series so that you hopefully
shouldn't have these issues.

On Jul 24, 2008, at 2:39 PM, Adam C Powell IV wrote:

> Greetings,
>
> I'm seeing a segfault in a code on Ubuntu 8.04 with gcc 4.2. I
> recompiled the Debian lenny openmpi 1.2.7~rc2 package on Ubuntu, and
> compiled the Debian lenny petsc and libmesh packages against that.
>
> Everything works just fine in Debian lenny (gcc 4.3), but in Ubuntu
> hardy it fails during MPI_Init:
>
> [Thread debugging using libthread_db enabled]
> [New Thread 0x7faceea6f6f0 (LWP 5376)]
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7faceea6f6f0 (LWP 5376)]
> 0x00007faceb265b8b in _int_malloc () from /usr/lib/libopen-pal.so.0
> (gdb) backtrace
> #0 0x00007faceb265b8b in _int_malloc () from /usr/lib/libopen-
> pal.so.0
> #1 0x00007faceb266e58 in malloc () from /usr/lib/libopen-pal.so.0
> #2 0x00007faceb248bfb in opal_class_initialize ()
> from /usr/lib/libopen-pal.so.0
> #3 0x00007faceb25ce2b in opal_malloc_init () from /usr/lib/libopen-
> pal.so.0
> #4 0x00007faceb249d97 in opal_init_util () from /usr/lib/libopen-
> pal.so.0
> #5 0x00007faceb249e76 in opal_init () from /usr/lib/libopen-pal.so.0
> #6 0x00007faced05a723 in ompi_mpi_init () from /usr/lib/libmpi.so.0
> #7 0x00007faced07c106 in PMPI_Init () from /usr/lib/libmpi.so.0
> #8 0x00007facee144d92 in libMesh::init () from /usr/lib/libmesh.so.
> 0.6.2
> #9 0x0000000000411f61 in main ()
>
> libMesh::init() just has an assertion and command line check before
> MPI_Init, so I think it's safe to conclude this is an OpenMPI problem.
>
> How can I help to test and fix this?
>
> This might be related to Vincent Rotival's problem in
> http://www.open-mpi.org/community/lists/users/2008/04/5427.php or
> maybe
> http://www.open-mpi.org/community/lists/users/2008/05/5668.php . On
> the
> latter, I'm building the Debian package, which should have the
> LDFLAGS="" fix. Hmm, nope, no LDFLAGS anywhere in the .diff.gz...
> The
> OpenMPI top-level Makefile has
> "LDFLAGS = -export-dynamic -Wl,-Bsymbolic-functions"
>
> -Adam
> --
> GPG fingerprint: D54D 1AEE B11C CE9B A02B C5DD 526F 01E8 564E E4B6
>
> Engineering consulting with open source tools
> http://www.opennovation.com/
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems