Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Memchecker report on v1.3b2 (includes potential bug reports)
From: Shiqing Fan (fan_at_[hidden])
Date: 2008-11-19 09:56:46


Dear François,

Thanks a lot for your report, it's really a great help for us. :-)

For the issues:

1) When you got "Conditional jump" errors, normally that means some
uninitialized(or undefined) values were used. The parameters that passed
into PMPI_Init_thread might contain uninitialized values, which could
cause errors (even seg-fault) later. I need some time to run your
application to check where these values exactly are. I'll post another
email when it's done.

2) You're right. We shouldn't check the buffer when the request is
completed and released. I'll fix that.

3) Absolutely correct. I'll fix that.

4) Well, this sounds reasonable, but according to the MPI-1 standard
(see page 40 for non-blocking send/recv, a more detailed explanation in
page 30):

"A nonblocking send call indicates that the system may start copying
data out of the send buffer. The sender should */not access*/ any part
of the send buffer after a nonblocking send operation is called, until
the send completes."

So before calling MPI_Wait to complete an isend operation, any access to
the send buffer is illegal. It might be a little strict, but we have to
do what the standard says.

5) Your feedbacks are alway welcome and, most importantly, helpful for
us to make improvements. ;-) Thanks again.

Best Regards,
Shiqing

François PELLEGRINI wrote:
> Hello all,
>
>
> I am the main developer of the Scotch parallel graph partitioning
> package, which uses both MPI and Posix Pthreads. I have been doing
> a great deal of testing of my program on various platforms and
> libraries, searching for potential bugs (there may still be some ;-) ).
>
> The new memchecker tool proposed in v1.3 is indeed interesting
> to me, so I tried it on my Linux platform. I used the following
> configuration options :
>
> % ./configure --enable-debug --enable-mem-debug --enable-memchecker
> --with-valgrind=/usr/bin --enable-mpi-threads
> --prefix=/home/pastix/pelegrin/openmpi
>
> % ompi_info
> Package: Open MPI pelegrin_at_laurel Distribution
> Open MPI: 1.3b2
> Open MPI SVN revision: r19927
> Open MPI release date: Nov 04, 2008
> Open RTE: 1.3b2
> Open RTE SVN revision: r19927
> Open RTE release date: Nov 04, 2008
> OPAL: 1.3b2
> OPAL SVN revision: r19927
> OPAL release date: Nov 04, 2008
> Ident string: 1.3b2
> Prefix: /home/pastix/pelegrin/openmpi
> Configured architecture: x86_64-unknown-linux-gnu
> Configure host: laurel
> Configured by: pelegrin
> Configured on: Wed Nov 19 00:50:50 CET 2008
> Configure host: laurel
> Built by: pelegrin
> Built on: mercredi 19 novembre 2008, 00:55:59 (UTC+0100)
> Built host: laurel
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: gcc
> C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
> C++ compiler absolute: /usr/bin/g++
> Fortran77 compiler: gfortran
> Fortran77 compiler abs: /usr/bin/gfortran
> Fortran90 compiler: gfortran
> Fortran90 compiler abs: /usr/bin/gfortran
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: yes, progress: no)
> Sparse Groups: no
> Internal debug support: yes
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: yes
> libltdl support: yes
> Heterogeneous support: no
> mpirun default --prefix: no
> MPI I/O support: yes
> MPI_WTIME support: gettimeofday
> Symbol visibility support: yes
> FT Checkpoint support: no (checkpoint thread: no)
> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3)
> MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.3)
> MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3)
> MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3)
> MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3)
> MCA carto: file (MCA v2.0, API v2.0, Component v1.3)
> MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3)
> MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.3)
> MCA timer: linux (MCA v2.0, API v2.0, Component v1.3)
> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3)
> MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3)
> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3)
> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3)
> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3)
> MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3)
> MCA coll: basic (MCA v2.0, API v2.0, Component v1.3)
> MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3)
> MCA coll: inter (MCA v2.0, API v2.0, Component v1.3)
> MCA coll: self (MCA v2.0, API v2.0, Component v1.3)
> MCA coll: sm (MCA v2.0, API v2.0, Component v1.3)
> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3)
> MCA io: romio (MCA v2.0, API v2.0, Component v1.3)
> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3)
> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3)
> MCA pml: cm (MCA v2.0, API v2.0, Component v1.3)
> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3)
> MCA pml: v (MCA v2.0, API v2.0, Component v1.3)
> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3)
> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3)
> MCA btl: self (MCA v2.0, API v2.0, Component v1.3)
> MCA btl: sm (MCA v2.0, API v2.0, Component v1.3)
> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3)
> MCA topo: unity (MCA v2.0, API v2.0, Component v1.3)
> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3)
> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3)
> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3)
> MCA iof: orted (MCA v2.0, API v2.0, Component v1.3)
> MCA iof: tool (MCA v2.0, API v2.0, Component v1.3)
> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3)
> MCA odls: default (MCA v2.0, API v2.0, Component v1.3)
> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3)
> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.3)
> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.3)
> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3)
> MCA rml: oob (MCA v2.0, API v2.0, Component v1.3)
> MCA routed: binomial (MCA v2.0, API v2.0, Component v1.3)
> MCA routed: direct (MCA v2.0, API v2.0, Component v1.3)
> MCA routed: linear (MCA v2.0, API v2.0, Component v1.3)
> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3)
> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3)
> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3)
> MCA errmgr: default (MCA v2.0, API v2.0, Component v1.3)
> MCA ess: env (MCA v2.0, API v2.0, Component v1.3)
> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3)
> MCA ess: singleton (MCA v2.0, API v2.0, Component v1.3)
> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3)
> MCA ess: tool (MCA v2.0, API v2.0, Component v1.3)
> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3)
> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3)
>
> % gcc --version
> gcc (Debian 4.3.2-1) 4.3.2
> Copyright (C) 2008 Free Software Foundation, Inc.
>
> I launched my program under valgrind on two procs and got the following report:
>
> % mpirun -np 2 valgrind ../bin/dgord ~/paral/graph/altr4.grf.gz /dev/null -vt
> ==10978== Memcheck, a memory error detector.
> ==10978== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
> ==10978== Using LibVEX rev 1854, a library for dynamic binary translation.
> ==10978== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
> ==10978== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework.
> ==10978== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
> ==10978== For more details, rerun with: -v
> ==10978==
> ==10979== Memcheck, a memory error detector.
> ==10979== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
> ==10979== Using LibVEX rev 1854, a library for dynamic binary translation.
> ==10979== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
> ==10979== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework.
> ==10979== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
> ==10979== For more details, rerun with: -v
> ==10979==
> ==10979== Syscall param sched_setaffinity(mask) points to unaddressable
> byte(s)==10978== Syscall param sched_setaffinity(mask) points to unaddressable
> byte(s)
> ==10978== at 0x65FB269: syscall (in /lib/libc-2.7.so)
>
> ==10978== by 0x6C8365A: opal_paffinity_linux_plpa_api_probe_init
> (plpa_api_probe.c:43)
> ==10978== by 0x6C83BB8: opal_paffinity_linux_plpa_init (plpa_runtime.c:36)
> ==10978== by 0x6C84984: opal_paffinity_linux_plpa_have_topology_information
> (plpa_map.c:501)
> ==10978== by 0x6C83129: linux_module_init (paffinity_linux_module.c:119)
> ==10978== by 0x5AB35EA: opal_paffinity_base_select (paffinity_base_select.c:64)
> ==10978== by 0x5A7DE99: opal_init (opal_init.c:292)
> ==10978== by 0x580087A: orte_init (orte_init.c:76)
> ==10978== by 0x551675F: ompi_mpi_init (ompi_mpi_init.c:343)
> ==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10978== by 0x4067CF: main (dgord.c:123)
> ==10978== Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==10979== at 0x65FB269: syscall (in /lib/libc-2.7.so)
> ==10979== by 0x6C8365A: opal_paffinity_linux_plpa_api_probe_init
> (plpa_api_probe.c:43)
> ==10979== by 0x6C83BB8: opal_paffinity_linux_plpa_init (plpa_runtime.c:36)
> ==10979== by 0x6C84984: opal_paffinity_linux_plpa_have_topology_information
> (plpa_map.c:501)
> ==10979== by 0x6C83129: linux_module_init (paffinity_linux_module.c:119)
> ==10979== by 0x5AB35EA: opal_paffinity_base_select (paffinity_base_select.c:64)
> ==10979== by 0x5A7DE99: opal_init (opal_init.c:292)
> ==10979== by 0x580087A: orte_init (orte_init.c:76)
> ==10979== by 0x551675F: ompi_mpi_init (ompi_mpi_init.c:343)
> ==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10979== by 0x4067CF: main (dgord.c:123)
> ==10979== Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==10978== Warning: set address range perms: large range 134217728 (defined)
> ==10979== Warning: set address range perms: large range 134217728 (defined)
> ==10978==
> ==10978== Conditional jump or move depends on uninitialised value(s)
> ==10978== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
> ==10978== by 0x972EBD4: mpool_calloc (btl_sm.c:109)
> ==10978== by 0x972F6A8: sm_btl_first_time_init (btl_sm.c:314)
> ==10978== by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
> ==10978== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
> ==10978== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
> ==10978== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
> ==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10978== by 0x4067CF: main (dgord.c:123)
> ==10978==
> ==10978== Conditional jump or move depends on uninitialised value(s)
> ==10978== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
> ==10978== by 0x972EBD4: mpool_calloc (btl_sm.c:109)
> ==10978== by 0x972EC85: init_fifos (btl_sm.c:125)
> ==10978== by 0x972F6CB: sm_btl_first_time_init (btl_sm.c:317)
> ==10978== by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
> ==10978== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
> ==10978== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
> ==10978== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
> ==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10978== by 0x4067CF: main (dgord.c:123)
> ==10978==
> ==10978== Conditional jump or move depends on uninitialised value(s)
> ==10978== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
> ==10978== by 0x54E660E: ompi_free_list_grow (ompi_free_list.c:198)
> ==10978== by 0x54E6435: ompi_free_list_init_ex_new (ompi_free_list.c:163)
> ==10978== by 0x972F9D3: ompi_free_list_init_new (ompi_free_list.h:169)
> ==10978== by 0x972F864: sm_btl_first_time_init (btl_sm.c:343)
> ==10978== by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
> ==10978== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
> ==10978== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
> ==10978== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
> ==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10978== by 0x4067CF: main (dgord.c:123)
> ==10979==
> ==10979== Conditional jump or move depends on uninitialised value(s)
> ==10979== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
> ==10979== by 0x972EBD4: mpool_calloc (btl_sm.c:109)
> ==10979== by 0x972F6A8: sm_btl_first_time_init (btl_sm.c:314)
> ==10979== by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
> ==10979== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
> ==10979== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
> ==10979== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
> ==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10979== by 0x4067CF: main (dgord.c:123)
> ==10979==
> ==10979== Conditional jump or move depends on uninitialised value(s)
> ==10979== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
> ==10979== by 0x972EBD4: mpool_calloc (btl_sm.c:109)
> ==10979== by 0x972EC85: init_fifos (btl_sm.c:125)
> ==10979== by 0x972F6CB: sm_btl_first_time_init (btl_sm.c:317)
> ==10979== by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
> ==10979== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
> ==10979== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
> ==10979== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
> ==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10979== by 0x4067CF: main (dgord.c:123)
> ==10979==
> ==10979== Conditional jump or move depends on uninitialised value(s)
> ==10979== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
> ==10979== by 0x54E660E: ompi_free_list_grow (ompi_free_list.c:198)
> ==10979== by 0x54E6435: ompi_free_list_init_ex_new (ompi_free_list.c:163)
> ==10979== by 0x972F9D3: ompi_free_list_init_new (ompi_free_list.h:169)
> ==10979== by 0x972F864: sm_btl_first_time_init (btl_sm.c:343)
> ==10979== by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
> ==10979== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
> ==10979== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
> ==10979== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
> ==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10979== by 0x4067CF: main (dgord.c:123)
> ==10979==
> ==10979== Conditional jump or move depends on uninitialised value(s)
> ==10979== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
> ==10979== by 0x9730165: ompi_fifo_init (ompi_fifo.h:280)
> ==10979== by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
> ==10979== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
> ==10979== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
> ==10979== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
> ==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10979== by 0x4067CF: main (dgord.c:123)
> ==10979==
> ==10979== Conditional jump or move depends on uninitialised value(s)
> ==10979== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
> ==10979== by 0x97302C4: ompi_cb_fifo_init (ompi_circular_buffer_fifo.h:158)
> ==10979== by 0x97301BA: ompi_fifo_init (ompi_fifo.h:288)
> ==10979== by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
> ==10979== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
> ==10979== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
> ==10979== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
> ==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10979== by 0x4067CF: main (dgord.c:123)
> ==10979==
> ==10979== Conditional jump or move depends on uninitialised value(s)
> ==10979== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
> ==10979== by 0x97303B3: ompi_cb_fifo_init (ompi_circular_buffer_fifo.h:180)
> ==10979== by 0x97301BA: ompi_fifo_init (ompi_fifo.h:288)
> ==10979== by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
> ==10979== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
> ==10979== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
> ==10979== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
> ==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10979== by 0x4067CF: main (dgord.c:123)
> ==10978==
> ==10978== Conditional jump or move depends on uninitialised value(s)
> ==10978== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
> ==10978== by 0x9730165: ompi_fifo_init (ompi_fifo.h:280)
> ==10978== by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
> ==10978== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
> ==10978== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
> ==10978== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
> ==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10978== by 0x4067CF: main (dgord.c:123)
> ==10978==
> ==10978== Conditional jump or move depends on uninitialised value(s)
> ==10978== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
> ==10978== by 0x97302C4: ompi_cb_fifo_init (ompi_circular_buffer_fifo.h:158)
> ==10978== by 0x97301BA: ompi_fifo_init (ompi_fifo.h:288)
> ==10978== by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
> ==10978== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
> ==10978== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
> ==10978== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
> ==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10978== by 0x4067CF: main (dgord.c:123)
> ==10978==
> ==10978== Conditional jump or move depends on uninitialised value(s)
> ==10978== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
> ==10978== by 0x97303B3: ompi_cb_fifo_init (ompi_circular_buffer_fifo.h:180)
> ==10978== by 0x97301BA: ompi_fifo_init (ompi_fifo.h:288)
> ==10978== by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
> ==10978== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
> ==10978== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
> ==10978== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
> ==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
> ==10978== by 0x4067CF: main (dgord.c:123)
> ==10979==
> ==10979== Uninitialised byte(s) found during client check request
> ==10979== at 0x5AB2C87: valgrind_module_isdefined
> (memchecker_valgrind_module.c:112)
> ==10979== by 0x5AB26CB: opal_memchecker_base_isdefined
> (memchecker_base_wrappers.c:34)
> ==10979== by 0x553F067: memchecker_call (memchecker.h:96)
> ==10979== by 0x553ECDF: PMPI_Bcast (pbcast.c:41)
> ==10979== by 0x40BB82: _SCOTCHdgraphLoad (dgraph_io_load.c:226)
> ==10979== by 0x406B32: main (dgord.c:265)
> ==10979== Address 0x7feffff74 is on thread 1's stack
> ==10978==
> ==10978== Invalid read of size 8
> ==10978== at 0x8F0D85A: memchecker_call (memchecker.h:94)
> ==10978== by 0x8F0D812: mca_pml_ob1_send_request_free (pml_ob1_sendreq.c:107)
> ==10978== by 0x55154DA: ompi_request_free (request.h:354)
> ==10978== by 0x5515B05: ompi_request_default_wait_all (req_wait.c:344)
> ==10978== by 0x556EA00: PMPI_Waitall (pwaitall.c:68)
> ==10978== by 0x41FC2D: _SCOTCHdgraphCoarsen (dgraph_coarsen.c:711)
> ==10978== by 0x415B34: vdgraphSeparateMl2 (vdgraph_separate_ml.c:99)==10979==
> ==10979== Invalid read of size 8
> ==10979== at 0x8F0D85A: memchecker_call (memchecker.h:94)
> ==10979== by 0x8F0D812: mca_pml_ob1_send_request_free (pml_ob1_sendreq.c:107)
> ==10979== by 0x55154DA: ompi_request_free (request.h:354)
> ==10979== by 0x5515B05: ompi_request_default_wait_all (req_wait.c:344)
> ==10979== by 0x556EA00: PMPI_Waitall (pwaitall.c:68)
> ==10979== by 0x41FC2D: _SCOTCHdgraphCoarsen (dgraph_coarsen.c:711)
> ==10979== by 0x415B34: vdgraphSeparateMl2 (vdgraph_separate_ml.c:99)
> ==10979== by 0x415D96: _SCOTCHvdgraphSeparateMl (vdgraph_separate_ml.c:660)
> ==10979== by 0x40EC49: _SCOTCHvdgraphSeparateSt (vdgraph_separate_st.c:327)
> ==10979== by 0x412F52: _SCOTCHhdgraphOrderNd (hdgraph_order_nd.c:294)
> ==10979== by 0x40EB26: _SCOTCHhdgraphOrderSt (hdgraph_order_st.c:216)
> ==10979== by 0x40734A: SCOTCH_dgraphOrderComputeList
> (library_dgraph_order.c:220)
> ==10979== Address 0x28 is not stack'd, malloc'd or (recently) free'd
>
> ==10978== by 0x415D96: _SCOTCHvdgraphSeparateMl (vdgraph_separate_ml.c:660)
> ==10978== by 0x40EC49: _SCOTCHvdgraphSeparateSt (vdgraph_separate_st.c:327)
> ==10978== by 0x412F52: _SCOTCHhdgraphOrderNd (hdgraph_order_nd.c:294)
> ==10978== by 0x40EB26: _SCOTCHhdgraphOrderSt (hdgraph_order_st.c:216)
> ==10978== by 0x40734A: SCOTCH_dgraphOrderComputeList
> (library_dgraph_order.c:220)
> ==10978== Address 0x28 is not stack'd, malloc'd or (recently) free'd
> [laurel:10979] *** Process received signal ***
> [laurel:10978] *** Process received signal ***
> [laurel:10979] Signal: Segmentation fault (11)
> [laurel:10979] Signal code: Address not mapped (1)
> [laurel:10979] Failing at address: 0x28
> [laurel:10978] Signal: Segmentation fault (11)
> [laurel:10978] Signal code: Address not mapped (1)
> [laurel:10978] Failing at address: 0x28
> [laurel:10979] [ 0] /lib/libpthread.so.0 [0x6321a80]
> [laurel:10979] [ 1] /home/pastix/pelegrin/openmpi/lib/openmpi/mca_pml_ob1.so
> [0x8f0d85a]
> [laurel:10979] [ 2] /home/pastix/pelegrin/openmpi/lib/openmpi/mca_pml_ob1.so
> [0x8f0d813]
> [laurel:10979] [ 3] /home/pastix/pelegrin/openmpi/lib/libmpi.so.0 [0x55154db]
> [laurel:10979] [ 4] /home/pastix/pelegrin/openmpi/lib/libmpi.so.0 [0x5515b06]
> [laurel:10979] [ 5]
> /home/pastix/pelegrin/openmpi/lib/libmpi.so.0(PMPI_Waitall+0x15d) [0x556ea01]
> [laurel:10979] [ 6] ../bin/dgord(_SCOTCHdgraphCoarsen+0x13ce) [0x41fc2e]
> [laurel:10979] [ 7] ../bin/dgord [0x415b35]
> [laurel:10979] [ 8] ../bin/dgord(_SCOTCHvdgraphSeparateMl+0x27) [0x415d97]
> [laurel:10979] [ 9] ../bin/dgord(_SCOTCHvdgraphSeparateSt+0x5a) [0x40ec4a]
> [laurel:10978] [ 0] /lib/libpthread.so.0 [0x6321a80]
> [laurel:10979] [10] ../bin/dgord(_SCOTCHhdgraphOrderNd+0xe3) [0x412f53]
> [laurel:10979] [11] ../bin/dgord(_SCOTCHhdgraphOrderSt+0x67) [0x40eb27]
> [laurel:10978] [ 1] /home/pastix/pelegrin/openmpi/lib/openmpi/mca_pml_ob1.so
> [0x8f0d85a]
> [laurel:10979] [12] ../bin/dgord(SCOTCH_dgraphOrderComputeList+0xeb) [0x40734b]
> [laurel:10979] [13] ../bin/dgord(main+0x3ec) [0x406b7c]
> [laurel:10979] [14] /lib/libc.so.6(__libc_start_main+0xe6) [0x654d1a6]
> [laurel:10979] [15] ../bin/dgord [0x406669]
> [laurel:10978] [ 2] /home/pastix/pelegrin/openmpi/lib/openmpi/mca_pml_ob1.so
> [0x8f0d813]
> [laurel:10978] [ 3] /home/pastix/pelegrin/openmpi/lib/libmpi.so.0 [0x55154db]
> [laurel:10978] [ 4] /home/pastix/pelegrin/openmpi/lib/libmpi.so.0 [0x5515b06]
> [laurel:10978] [ 5]
> /home/pastix/pelegrin/openmpi/lib/libmpi.so.0(PMPI_Waitall+0x15d) [0x556ea01]
> [laurel:10978] [ 6] ../bin/dgord(_SCOTCHdgraphCoarsen+0x13ce) [0x41fc2e]
> [laurel:10978] [ 7] ../bin/dgord [0x415b35]
> [laurel:10978] [ 8] ../bin/dgord(_SCOTCHvdgraphSeparateMl+0x27) [0x415d97]
> [laurel:10978] [ 9] ../bin/dgord(_SCOTCHvdgraphSeparateSt+0x5a) [0x40ec4a]
> [laurel:10979] *** End of error message ***
> [laurel:10978] [10] ../bin/dgord(_SCOTCHhdgraphOrderNd+0xe3) [0x412f53]
> [laurel:10978] [11] ../bin/dgord(_SCOTCHhdgraphOrderSt+0x67) [0x40eb27]
> [laurel:10978] [12] ../bin/dgord(SCOTCH_dgraphOrderComputeList+0xeb) [0x40734b]
> [laurel:10978] [13] ../bin/dgord(main+0x3ec) [0x406b7c]
> [laurel:10978] [14] /lib/libc.so.6(__libc_start_main+0xe6) [0x654d1a6]
> [laurel:10978] [15] ../bin/dgord [0x406669]
> ==10979== [laurel:10978] *** End of error message ***
>
> ==10979== Process terminating with default action of signal 11 (SIGSEGV)
> ==10979== Access not within mapped region at address 0x29
> ==10979== at 0x8F0D85A: memchecker_call (memchecker.h:94)
> ==10978==
> ==10978== Process terminating with default action of signal 11 (SIGSEGV)
> ==10978== Access not within mapped region at address 0x29
> ==10978== at 0x8F0D85A: memchecker_call (memchecker.h:94)
> ==10978== by 0x8F0D812: mca_pml_ob1_send_request_free (pml_ob1_sendreq.c:107)
> ==10978== by 0x55154DA: ompi_request_free (request.h:354)
> ==10978== by 0x5515B05: ompi_request_default_wait_all (req_wait.c:344)
> ==10978== by 0x556EA00: PMPI_Waitall (pwaitall.c:68)
> ==10978== by 0x41FC2D: _SCOTCHdgraphCoarsen (dgraph_coarsen.c:711)
> ==10978== by 0x415B34: vdgraphSeparateMl2 (vdgraph_separate_ml.c:99)
> ==10978== by 0x415D96: _SCOTCHvdgraphSeparateMl (vdgraph_separate_ml.c:660)
> ==10978== by 0x40EC49: _SCOTCHvdgraphSeparateSt (vdgraph_separate_st.c:327)
> ==10978== by 0x412F52: _SCOTCHhdgraphOrderNd (hdgraph_order_nd.c:294)
> ==10978== by 0x40EB26: _SCOTCHhdgraphOrderSt (hdgraph_order_st.c:216)
> ==10978== by 0x40734A: SCOTCH_dgraphOrderComputeList
> (library_dgraph_order.c:220)
> ==10979== by 0x8F0D812: mca_pml_ob1_send_request_free (pml_ob1_sendreq.c:107)
> ==10979== by 0x55154DA: ompi_request_free (request.h:354)
> ==10979== by 0x5515B05: ompi_request_default_wait_all (req_wait.c:344)
> ==10979== by 0x556EA00: PMPI_Waitall (pwaitall.c:68)
> ==10979== by 0x41FC2D: _SCOTCHdgraphCoarsen (dgraph_coarsen.c:711)
> ==10979== by 0x415B34: vdgraphSeparateMl2 (vdgraph_separate_ml.c:99)
> ==10979== by 0x415D96: _SCOTCHvdgraphSeparateMl (vdgraph_separate_ml.c:660)
> ==10979== by 0x40EC49: _SCOTCHvdgraphSeparateSt (vdgraph_separate_st.c:327)
> ==10979== by 0x412F52: _SCOTCHhdgraphOrderNd (hdgraph_order_nd.c:294)
> ==10979== by 0x40EB26: _SCOTCHhdgraphOrderSt (hdgraph_order_st.c:216)
> ==10979== by 0x40734A: SCOTCH_dgraphOrderComputeList
> (library_dgraph_order.c:220)
> ==10979==
> ==10979== ERROR SUMMARY: 14 errors from 9 contexts (suppressed: 264 from 3)
> ==10979== malloc/free: in use at exit: 4,626,295 bytes in 2,614 blocks.==10978==
> ==10978== ERROR SUMMARY: 13 errors from 8 contexts (suppressed: 264 from 3)
>
> ==10979== malloc/free: 9,296 allocs, 6,682 frees, 11,121,335 bytes allocated.
> ==10979== For counts of detected errors, rerun with: -v==10978== malloc/free:
> in use at exit: 4,671,068 bytes in 2,627 blocks.
> ==10978== malloc/free: 9,315 allocs, 6,688 frees, 13,108,494 bytes allocated.
> ==10978== For counts of detected errors, rerun with: -v
>
> ====10979== searching for pointers to 2,614 not-freed blocks.
> 10978== searching for pointers to 2,627 not-freed blocks.
> ==10978== checked 138,090,848 bytes.
> ==10978== ==10979== checked 138,047,136 bytes.
> ==10979==
> ==10979== LEAK SUMMARY:
> ==10979== definitely lost: 2,049 bytes in 25 blocks.
> ==10979== possibly lost: 2,405,098 bytes in 60 blocks.
> ==10979== still reachable: 2,219,148 bytes in 2,529 blocks.
> ==10979== suppressed: 0 bytes in 0 blocks.
> ==10979== Rerun with --leak-check=full to see details of leaked memory.
>
> ==10978== LEAK SUMMARY:
> ==10978== definitely lost: 2,125 bytes in 27 blocks.
> ==10978== possibly lost: 2,445,353 bytes in 63 blocks.
> ==10978== still reachable: 2,223,590 bytes in 2,537 blocks.
> ==10978== suppressed: 0 bytes in 0 blocks.
> ==10978== Rerun with --leak-check=full to see details of leaked memory.
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 10979 on node laurel exited on
> signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
>
> I want to report the following issues :
>
> 1)- The "Conditional jump or move depends on uninitialised value(s)"
> messages are quite puzzling. Do they correspond to real problems
> in OpenMPI or should they just be ignored ?
>
> 2)- The MPI_Waitall call which causes the problem spans across a set
> of former receive requests already set to MPI_REQUEST_NULL, and
> to a set of matching (and hence matched) Isend requests.
>
> 3)- Memchecker also complaints (I think wrongfully) in the case of a
> Bcast where the receivers have not pre-set all of their receive array.
> I guess in the memcheck process the sender side and the receiver
> sides should get different treatment, since only one data array is
> passed, which is either to be read or written depending on the root
> process number.
>
> 4)- It also complaints when two Isend's correspond to overlapping regions
> of the same memory area. It seems that the first Isend sets flags on
> the region as "non-readable", while it should just be "non-writeable",
> isn't it ?
>
> 5)- Keep doing the good job ! Congrats. ;-)
>
>
> Sincerely yours,
>
>
> f.p.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>