On Mon, 02 Nov 2009 13:22:11 +0100
Mondrian Nuessle <nuessle_at_[hidden]> wrote:
> If I turn on mpi_leave_pinned (and thus the registration cache is
> actually used), I see occasional memory corruption issues for example
> when I call MPI_Allreduce often.
> Debugging with valgrind did not lead to any clues, since OMPI refuses
> to run in that case. If I turn off mpi_leave_pinned, everything seems
> to be fine.
> I tested on version 1.3.3 and 1.3.4rc1.
> Do you have any suggestions how to investigate this situation?
Have you got OMPI_ENABLE_DEBUG defined? The symptoms of what you are
seeing sound like what might happen if debug is off and you trigger an
issue I posted about here related to thread safety of mpool.
If OMPI_ENABLE_DEBUG is defined it will abort if pthread_mutex_lock
returns EDEADLK (see opal_mutex_lock), but if not, the code proceeds
without a lock which could cause memory corruption.