On Apr 22, 2009, at 7:35 PM, François PELLEGRINI wrote:
> I have had no answers regarding the trouble (OpenMPI bug ?)
> I evidenced when combining OpenMPI and valgrind.
>
Sorry for the delay in getting back to you; there are so many mails
and only so many hours in the day... :-(
> I tried it with a newer version of OpenMPI, and the problems
> persist, with new, even more worrying, error messages being
> displayed :
>
> ==32142== Warning: client syscall munmap tried to modify addresses
> 0xFFFFFFFF-0xFFE
>
> (but this happens for all the programs I tried)
>
> The original error messages, which are still here, were the
> following :
>
> ==32143== Source and destination overlap in memcpy(0x4A73DA8,
> 0x4A73DB0, 16)
> ==32143== at 0x40236C9: memcpy (mc_replace_strmem.c:402)
> ==32143== by 0x407C9DC: ompi_ddt_copy_content_same_ddt (dt_copy.c:
> 171)
> ==32143== by 0x512EA61: ompi_coll_tuned_allgather_intra_bruck
> (coll_tuned_allgather.c:193)
> ==32143== by 0x5126D90: ompi_coll_tuned_allgather_intra_dec_fixed
> (coll_tuned_decision_fixed.c:562)
> ==32143== by 0x408986A: PMPI_Allgather (pallgather.c:101)
> ==32143== by 0x80487D7: main (in /tmp/brol)
>
> I do not get this "memcpy" messages when running on 2 processors.
> I therefore assume it is a rounding problem wrt the number of procs.
>
Good. This is possibly related to a post from last night:
http://www.open-mpi.org/community/lists/users/2009/04/9138.php.
Some of the valgrind warnings are unavoidable, unfortunately -- e.g.,
those within system calls. Note that you *can* avoid the valgrind
warnings in PLPA (the linux paffainity component) if you configure
OMPI --with-valgrind. This will proagmatically tell valgrind that the
memory access that PLPA is doing "is ok" (i.e., it's specifically
intended to be an error for long/complicated reasons).
But I'm able to replicate your error (but shouldn't the 2nd buffer be
the 1st + size (not 2)?) -- let me dig into it a bit... we definitely
shouldn't be getting invalid writes in the convertor, etc.
I've filed ticket #1903 about this issue:
https://svn.open-mpi.org/trac/ompi/ticket/1903
--
Jeff Squyres
Cisco Systems
|