Aurelien and Brian.
Thanks for the suggestions. I reran the runs with --without-memory-manager and
got (on 2 of 5000 runs):
*** glibc detected *** corrupted double-linked list: 0xf704dff8 ***
on one and
*** glibc detected *** malloc(): memory corruption: 0xeda00c70 ***
on the other.
So it looks like somewhere we are over-running our allocated space. So now I
am attempting to redo the run with valgrind.
On Thursday 20 September 2007 09:59:14 pm Brian Barrett wrote:
> On Sep 20, 2007, at 7:02 AM, Tim Prins wrote:
> > In our nightly runs with the trunk I have started seeing cases
> > where we
> > appear to be segfaulting within/below malloc. Below is a typical
> > output.
> > Note that this appears to only happen on the trunk, when we use
> > openib,
> > and are in 32 bit mode. It seems to happen randomly at a very low
> > frequency (59 out of about 60,000 32 bit openib runs).
> > This could be a problem with our machine, and has showed up since I
> > started testing 32bit ofed 10 days ago.
> > Anyways, just curious if anyone had any ideas.
> As someone else said, this usually points to a duplicate free or the
> like in malloc. You might want to try compiling with --without-
> memory-manager, as the ptmalloc2 in glibc frequently is more verbose
> about where errors occurred than is the one in Open MPI.
> devel mailing list