On Aug 24, 2007, at 11:05 PM, Josh Aune wrote:
>> Hmm. If you compile Open MPI with no memory manager, then it
>> *shouldn't* be Open MPI's fault (unless there's a leak in the mvapi
>> BTL...?). Verify that you did not actually compile Open MPI with a
>> memory manager by running "ompi_info| grep ptmalloc2" -- it should
>> come up empty.
> I am sure. I have multiple builds that I switch between. One of the
> apps doesn't work unless I --without-memory-manager (see post to
> -users about realloc(), with sample code).
> I noticed that there are a few ./configure --debug type switches, even
> some dealing with memory. Could those be useful for gathering further
> data? What features do those provide and how do I use them?
If you use --enable-mem-debug, they force all internal calls to malloc
(), free(), and calloc() to go through our own internal functions,
but those mainly just check that we don't pass bad parameters such as
NULL, etc. I suppose you could put in some memory profiling or
something, but that would probably get pretty sticky. :-(
>> The fact that you can run this under TCP without memory leaking would
>> seem to indicate that it's not the app that's leaking memory, but
>> rather either the MPI or the network stack.
> I should clarify here, this is effectively true. The app crashes from
> a segfault after running over tcp for several hours, but it gets much
> farther into the run than the vapi btl does.
Yuck. :-( I assume there's no easy way to track this down -- do you
get a corefile? Can you see where the app died -- are there any
obvious indexes going out of range of array bounds, etc.? Is it in
MPI or in the application?