Hello all,
I have had no answers regarding the trouble (OpenMPI bug ?)
I evidenced when combining OpenMPI and valgrind.
I tried it with a newer version of OpenMPI, and the problems
persist, with new, even more worrying, error messages being displayed :
==32142== Warning: client syscall munmap tried to modify addresses 0xFFFFFFFF-0xFFE
(but this happens for all the programs I tried)
The original error messages, which are still here, were the
following :
==32143== Source and destination overlap in memcpy(0x4A73DA8, 0x4A73DB0, 16)
==32143== at 0x40236C9: memcpy (mc_replace_strmem.c:402)
==32143== by 0x407C9DC: ompi_ddt_copy_content_same_ddt (dt_copy.c:171)
==32143== by 0x512EA61: ompi_coll_tuned_allgather_intra_bruck
(coll_tuned_allgather.c:193)
==32143== by 0x5126D90: ompi_coll_tuned_allgather_intra_dec_fixed
(coll_tuned_decision_fixed.c:562)
==32143== by 0x408986A: PMPI_Allgather (pallgather.c:101)
==32143== by 0x80487D7: main (in /tmp/brol)
I do not get this "memcpy" messages when running on 2 processors.
I therefore assume it is a rounding problem wrt the number of procs.
1) The program
==============
The program "brol.c" I am running is very simple :
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int
main (
int argc,
char * argv[])
{
int procglbnbr;
int proclocnum;
int * dataloctab;
int * dataglbtab;
if (MPI_Init (&argc, &argv) != MPI_SUCCESS)
exit (1);
MPI_Comm_size (MPI_COMM_WORLD, &procglbnbr);
MPI_Comm_rank (MPI_COMM_WORLD, &proclocnum);
dataloctab = malloc (2 * (procglbnbr + 1) * sizeof (int));
dataglbtab = dataloctab + 2;
dataloctab[0] =
dataloctab[1] = proclocnum;
if (MPI_Allgather (dataloctab, 2, MPI_INT,
dataglbtab, 2, MPI_INT, MPI_COMM_WORLD) != MPI_SUCCESS)
exit (1);
MPI_Finalize ();
return (0);
}
2) Configuration
================
I compile it with : "mpicc brol.c -o brol"
I run it with : "mpirun -np 3 valgrind ./brol"
I do not get the "memcpy" messages when running on 2 processors.
I therefore assume, as I said above, that it is a rounding problem.
ompi_info says :
Package: Open MPI pelegrin_at_brol Distribution
Open MPI: 1.3.2rc1r21037
Open MPI SVN revision: r21037
Open MPI release date: Unreleased developer copy
Open RTE: 1.3.2rc1r21037
Open RTE SVN revision: r21037
Open RTE release date: Unreleased developer copy
OPAL: 1.3.2rc1r21037
OPAL SVN revision: r21037
OPAL release date: Unreleased developer copy
Ident string: 1.3.2rc1r21037
Prefix: /usr/local
Configured architecture: i686-pc-linux-gnu
Configure host: brol
Configured by: pelegrin
Configured on: Sun Apr 19 20:53:17 CEST 2009
Configure host: brol
Built by: pelegrin
Built on: Sun Apr 19 21:05:30 CEST 2009
Built host: brol
C bindings: yes
C++ bindings: yes
Fortran77 bindings: yes (all)
Fortran90 bindings: yes
Fortran90 bindings size: small
C compiler: gcc
C compiler absolute: /usr/bin/gcc
C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
Fortran77 compiler: gfortran
Fortran77 compiler abs: /usr/bin/gfortran
Fortran90 compiler: gfortran
Fortran90 compiler abs: /usr/bin/gfortran
C profiling: yes
C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
C++ exceptions: no
Thread support: posix (mpi: yes, progress: no)
Sparse Groups: no
Internal debug support: yes
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: yes
libltdl support: yes
Heterogeneous support: no
mpirun default --prefix: no
MPI I/O support: yes
MPI_WTIME support: gettimeofday
Symbol visibility support: yes
FT Checkpoint support: no (checkpoint thread: no)
MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3.2)
MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.3.2)
MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.2)
MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3.2)
MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3.2)
MCA carto: file (MCA v2.0, API v2.0, Component v1.3.2)
MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3.2)
MCA timer: linux (MCA v2.0, API v2.0, Component v1.3.2)
MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.2)
MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.2)
MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.2)
MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.2)
MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.2)
MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: self (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.2)
MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.2)
MCA io: romio (MCA v2.0, API v2.0, Component v1.3.2)
MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.2)
MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.2)
MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.2)
MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.2)
MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.2)
MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.2)
MCA pml: v (MCA v2.0, API v2.0, Component v1.3.2)
MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.2)
MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.2)
MCA btl: self (MCA v2.0, API v2.0, Component v1.3.2)
MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.2)
MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.2)
MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.2)
MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.2)
MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.2)
MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.2)
MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.2)
MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.2)
MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.2)
MCA odls: default (MCA v2.0, API v2.0, Component v1.3.2)
MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.2)
MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.3.2)
MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.3.2)
MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.2)
MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.2)
MCA routed: binomial (MCA v2.0, API v2.0, Component v1.3.2)
MCA routed: direct (MCA v2.0, API v2.0, Component v1.3.2)
MCA routed: linear (MCA v2.0, API v2.0, Component v1.3.2)
MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.2)
MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.2)
MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.2)
MCA errmgr: default (MCA v2.0, API v2.0, Component v1.3.2)
MCA ess: env (MCA v2.0, API v2.0, Component v1.3.2)
MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.2)
MCA ess: singleton (MCA v2.0, API v2.0, Component v1.3.2)
MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.2)
MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.2)
MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.2)
MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.2)
I configured OpenMPI with :
./configure --enable-debug --enable-mem-debug --enable-mpi-threads
--enable-memchecker --with-valgrind=/usr
3) Messages
===========
In addition to the "memcpy" message, I also get a bunch
of strange messages. Some excerpts :
==32141== Conditional jump or move depends on uninitialised value(s)
==32141== at 0x5005A03: mca_mpool_sm_alloc (mpool_sm_module.c:79)
==32141== by 0x40393E8: ompi_free_list_grow (ompi_free_list.c:198)
==32141== by 0x403926D: ompi_free_list_init_ex_new (ompi_free_list.c:163)
==32141== by 0x506CEFE: ompi_free_list_init_new (ompi_free_list.h:169)
==32141== by 0x506CD67: sm_btl_first_time_init (btl_sm.c:333)
==32141== by 0x506D1E2: mca_btl_sm_add_procs (btl_sm.c:484)
==32141== by 0x5062433: mca_bml_r2_add_procs (bml_r2.c:206)
==32141== by 0x50427AE: mca_pml_ob1_add_procs (pml_ob1.c:308)
==32141== by 0x4067F0E: ompi_mpi_init (ompi_mpi_init.c:667)
==32141== by 0x40A4242: PMPI_Init (pinit.c:80)
==32141== by 0x8048733: main (in /tmp/brol)
==32141== Conditional jump or move depends on uninitialised value(s)
==32141== at 0x5005A03: mca_mpool_sm_alloc (mpool_sm_module.c:79)
==32141== by 0x506D4D7: sm_fifo_init (btl_sm.h:213)
==32141== by 0x506D2D0: mca_btl_sm_add_procs (btl_sm.c:510)
==32141== by 0x5062433: mca_bml_r2_add_procs (bml_r2.c:206)
==32141== by 0x50427AE: mca_pml_ob1_add_procs (pml_ob1.c:308)
==32141== by 0x4067F0E: ompi_mpi_init (ompi_mpi_init.c:667)
==32141== by 0x40A4242: PMPI_Init (pinit.c:80)
==32141== by 0x8048733: main (in /tmp/brol)
Thanks in advance,
f.p.
|