On Tue, 2008-07-08 at 18:01 -0700, Tom Riddle wrote:
> Thanks Ashley, after going through your suggestions we tried our test
> with valgrind 3.3.0 and with glibc-devel-2.5-18.el5_1.1, both exhibit
> the same results. A simple non-MPI test prog however returns expected
> responses, so valgrind itself look ok. We then checked that the same
> (shared) libc gets linked in both the MPI and non-MPI cases, adding
> -pthread to the cc command line yields the same result, the only
> difference it appears is the open mpi libraries.
> Now mpicc links against libopen-pal which defines malloc for it's own
> purposes. The big difference seems to be that libopen-pal.so is
> providing its own malloc replacement
This will be the problem, I've tested on a openmpi (1.2.6) machine here
and I see exactly the same behaviour as you. I re-compiled the
application without libopen-pal and valgrind works as expected. To do
this I used mpicc -show to see what compile line it was using and ran
the command myself without the -lopen-pal option. This clearly isn't a
acceptable long-term solution but might help you in the short term.
I'm a MPI expert but work on a different MPI to openmpi normally, I have
however done a lot of work with Valgrind on different platforms so pick
up questions related to it here. I think this problem is going to need
input from one of the openmpi guys...
The problem seems to be the presence of malloc() and free() functions in
the libopen-pal library is preventing valgrind from intercepting these
functions in glibc and hence dramatically reducing the benefit which