Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bogus memcpy or bogus valgrind record
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-04-29 15:29:49

On Apr 22, 2009, at 7:35 PM, François PELLEGRINI wrote:

> I have had no answers regarding the trouble (OpenMPI bug ?)
> I evidenced when combining OpenMPI and valgrind.

Sorry for the delay in getting back to you; there are so many mails
and only so many hours in the day... :-(

> I tried it with a newer version of OpenMPI, and the problems
> persist, with new, even more worrying, error messages being
> displayed :
> ==32142== Warning: client syscall munmap tried to modify addresses
> (but this happens for all the programs I tried)
> The original error messages, which are still here, were the
> following :
> ==32143== Source and destination overlap in memcpy(0x4A73DA8,
> 0x4A73DB0, 16)
> ==32143== at 0x40236C9: memcpy (mc_replace_strmem.c:402)
> ==32143== by 0x407C9DC: ompi_ddt_copy_content_same_ddt (dt_copy.c:
> 171)
> ==32143== by 0x512EA61: ompi_coll_tuned_allgather_intra_bruck
> (coll_tuned_allgather.c:193)
> ==32143== by 0x5126D90: ompi_coll_tuned_allgather_intra_dec_fixed
> (coll_tuned_decision_fixed.c:562)
> ==32143== by 0x408986A: PMPI_Allgather (pallgather.c:101)
> ==32143== by 0x80487D7: main (in /tmp/brol)
> I do not get this "memcpy" messages when running on 2 processors.
> I therefore assume it is a rounding problem wrt the number of procs.

Good. This is possibly related to a post from last night:

Some of the valgrind warnings are unavoidable, unfortunately -- e.g.,
those within system calls. Note that you *can* avoid the valgrind
warnings in PLPA (the linux paffainity component) if you configure
OMPI --with-valgrind. This will proagmatically tell valgrind that the
memory access that PLPA is doing "is ok" (i.e., it's specifically
intended to be an error for long/complicated reasons).

But I'm able to replicate your error (but shouldn't the 2nd buffer be
the 1st + size (not 2)?) -- let me dig into it a bit... we definitely
shouldn't be getting invalid writes in the convertor, etc.

I've filed ticket #1903 about this issue:

Jeff Squyres
Cisco Systems