If you are not using iWARP or InfiniBand networking, try configuring
Open MPI --without-memory-manager and see if that solves your
problem. Issues like this can come up, especially in C++ codes, when
the application (or supporting libraries) have their own memory
managers that conflict with Open MPI's. We *only* use an internal
memory manager for optimizing benchmark performance on iWARP and IB
networks, so if you're not using iWARP or IB, and/or your application
doesn't re-use the same buffers repeatedly to MPI_SEND/MPI_RECV, then
you don't need our memory manager.
To be 100% clear: OMPI's internal memory manager is only used for the
"mpi_leave_pinned" behavior. OMPI runs fine without it, but will
definitely see degraded performance in apps that continually re-use
the same buffers for MPI_SEND/MPI_RECV (i.e., benchmarks).
FYI: for these kinds of reasons, we're changing how we do
mpi_leave_pinned in the upcoming v1.3 series so that you hopefully
shouldn't have these issues.
On Jul 24, 2008, at 2:39 PM, Adam C Powell IV wrote:
> I'm seeing a segfault in a code on Ubuntu 8.04 with gcc 4.2. I
> recompiled the Debian lenny openmpi 1.2.7~rc2 package on Ubuntu, and
> compiled the Debian lenny petsc and libmesh packages against that.
> Everything works just fine in Debian lenny (gcc 4.3), but in Ubuntu
> hardy it fails during MPI_Init:
> [Thread debugging using libthread_db enabled]
> [New Thread 0x7faceea6f6f0 (LWP 5376)]
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7faceea6f6f0 (LWP 5376)]
> 0x00007faceb265b8b in _int_malloc () from /usr/lib/libopen-pal.so.0
> (gdb) backtrace
> #0 0x00007faceb265b8b in _int_malloc () from /usr/lib/libopen-
> #1 0x00007faceb266e58 in malloc () from /usr/lib/libopen-pal.so.0
> #2 0x00007faceb248bfb in opal_class_initialize ()
> from /usr/lib/libopen-pal.so.0
> #3 0x00007faceb25ce2b in opal_malloc_init () from /usr/lib/libopen-
> #4 0x00007faceb249d97 in opal_init_util () from /usr/lib/libopen-
> #5 0x00007faceb249e76 in opal_init () from /usr/lib/libopen-pal.so.0
> #6 0x00007faced05a723 in ompi_mpi_init () from /usr/lib/libmpi.so.0
> #7 0x00007faced07c106 in PMPI_Init () from /usr/lib/libmpi.so.0
> #8 0x00007facee144d92 in libMesh::init () from /usr/lib/libmesh.so.
> #9 0x0000000000411f61 in main ()
> libMesh::init() just has an assertion and command line check before
> MPI_Init, so I think it's safe to conclude this is an OpenMPI problem.
> How can I help to test and fix this?
> This might be related to Vincent Rotival's problem in
> http://www.open-mpi.org/community/lists/users/2008/04/5427.php or
> http://www.open-mpi.org/community/lists/users/2008/05/5668.php . On
> latter, I'm building the Debian package, which should have the
> LDFLAGS="" fix. Hmm, nope, no LDFLAGS anywhere in the .diff.gz...
> OpenMPI top-level Makefile has
> "LDFLAGS = -export-dynamic -Wl,-Bsymbolic-functions"
> GPG fingerprint: D54D 1AEE B11C CE9B A02B C5DD 526F 01E8 564E E4B6
> Engineering consulting with open source tools
> users mailing list