Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-08-27 22:12:12

On Aug 24, 2007, at 11:05 PM, Josh Aune wrote:

>> Hmm. If you compile Open MPI with no memory manager, then it
>> *shouldn't* be Open MPI's fault (unless there's a leak in the mvapi
>> BTL...?). Verify that you did not actually compile Open MPI with a
>> memory manager by running "ompi_info| grep ptmalloc2" -- it should
>> come up empty.
> I am sure. I have multiple builds that I switch between. One of the
> apps doesn't work unless I --without-memory-manager (see post to
> -users about realloc(), with sample code).


> I noticed that there are a few ./configure --debug type switches, even
> some dealing with memory. Could those be useful for gathering further
> data? What features do those provide and how do I use them?

If you use --enable-mem-debug, they force all internal calls to malloc
(), free(), and calloc() to go through our own internal functions,
but those mainly just check that we don't pass bad parameters such as
NULL, etc. I suppose you could put in some memory profiling or
something, but that would probably get pretty sticky. :-(

>> The fact that you can run this under TCP without memory leaking would
>> seem to indicate that it's not the app that's leaking memory, but
>> rather either the MPI or the network stack.
> I should clarify here, this is effectively true. The app crashes from
> a segfault after running over tcp for several hours, but it gets much
> farther into the run than the vapi btl does.

Yuck. :-( I assume there's no easy way to track this down -- do you
get a corefile? Can you see where the app died -- are there any
obvious indexes going out of range of array bounds, etc.? Is it in
MPI or in the application?

Jeff Squyres
Cisco Systems