Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Valgrind writev() errors with 1.3.2.
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-06-08 17:04:15


There is a whole page on valgrind web page about this topic. Please
read http://valgrind.org/docs/manual/manual-core.html#manual-core.suppress
  for more information.

   george.

On Jun 8, 2009, at 15:24 , Ralph Castain wrote:

> We deliberately choose to not initialize our msg buffers as this
> takes considerable time. Instead, we fill in only the portion
> required by a given message, and then send only that much of the
> buffer. Thus, the uninitialized portion is ignored.
>
> I don't know of a way to tell valgrind to ignore it, I'm afraid -
> perhaps a valgrind guru can be of help. :-/
>
> Ralph
>
>
> On Mon, Jun 8, 2009 at 1:09 PM, tom fogal <tfogal_at_[hidden]>
> wrote:
> Hi all,
>
> I've configured a source build of OpenMPI 1.3.2 with valgrind enabled
> [1], and I'm seeing a lot of errors with writev() when I run this
> under
> valgrind. For example, with the following `hello, world' program:
>
> #include <stdio.h>
> #include <mpi.h>
>
> int main(int argc, char *argv[]) {
> MPI_Init(&argc, &argv);
>
> puts("Hello, world!");
> MPI_Finalize();
> return 0;
> }
>
> I see errors like the following:
>
> ==12342== Syscall param writev(vector[...]) points to uninitialised
> byte(s)
> ==12342== at 0x61DF733: writev (in /lib/libc-2.7.so)
> ==12342== by 0x7889AB9: mca_oob_tcp_msg_send_handler
> (oob_tcp_msg.c:265)
> ==12342== by 0x788B1A0: mca_oob_tcp_peer_send (oob_tcp_peer.c:197)
> ==12342== by 0x788FF2A: mca_oob_tcp_send_nb (oob_tcp_send.c:167)
> ==12342== by 0x767C7EC: orte_rml_oob_send (rml_oob_send.c:137)
> ==12342== by 0x767D19A: orte_rml_oob_send_buffer (rml_oob_send.c:
> 269)
> ==12342== by 0x7C9F3DF: allgather (grpcomm_bad_module.c:369)
> ==12342== by 0x7C9FD9E: modex (grpcomm_bad_module.c:497)
> ==12342== by 0x4E6DCAF: ompi_mpi_init (ompi_mpi_init.c:626)
>
> The full vg log is appended [2]. Of course, I could just suppress
> this error, but I get this for a lot (every?) MPI call which does
> communication, it seems (broadcasts, sends, recv's, allgathers, etc.).
> I'm worried a suppression would suppress too much / suppress an error
> I've caused.
>
> Have others seen this? Can I suppress perhaps from the
> orte_rml_oob_send_buffer down (safely)?
>
> -tom
>
> [1] configured via: gnu_pkg \
> --enable-debug \
> --enable-memchecker \
> --disable-mpi-f77 \
> --enable-pretty-print-stacktrace \
> --enable-cxx-exceptions \
> --enable-mpi-threads \
> --with-valgrind=${PREFIX} \
> --without-gm \
> --without-mx \
> --without-openib \
> --without-psm \
> --with-pic \
> --with-gnu-ld
> where gnu_pkg is basically a function which calls configure with
> --prefix=${PREFIX}.
>
> [2]
> ==12342== Memcheck, a memory error detector.
> ==12342== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward
> et al.
> ==12342== Using LibVEX rev 1884, a library for dynamic binary
> translation.
> ==12342== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP.
> ==12342== Using valgrind-3.4.1, a dynamic binary instrumentation
> framework.
> ==12342== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward
> et al.
> ==12342== For more details, rerun with: -v
> ==12342==
> ==12342== My PID = 12342, parent PID = 12341. Prog and args are:
> ==12342== ./a.out
> ==12342==
> ==12342== Warning: client syscall munmap tried to modify addresses
> 0xffffffffffffffff-0xffe
> ==12342== Syscall param writev(vector[...]) points to uninitialised
> byte(s)
> ==12342== at 0x61DF733: writev (in /lib/libc-2.7.so)
> ==12342== by 0x7889AB9: mca_oob_tcp_msg_send_handler
> (oob_tcp_msg.c:265)
> ==12342== by 0x788B1A0: mca_oob_tcp_peer_send (oob_tcp_peer.c:197)
> ==12342== by 0x788FF2A: mca_oob_tcp_send_nb (oob_tcp_send.c:167)
> ==12342== by 0x767C7EC: orte_rml_oob_send (rml_oob_send.c:137)
> ==12342== by 0x767D19A: orte_rml_oob_send_buffer (rml_oob_send.c:
> 269)
> ==12342== by 0x7C9F3DF: allgather (grpcomm_bad_module.c:369)
> ==12342== by 0x7C9FD9E: modex (grpcomm_bad_module.c:497)
> ==12342== by 0x4E6DCAF: ompi_mpi_init (ompi_mpi_init.c:626)
> ==12342== by 0x4EAAC88: PMPI_Init (pinit.c:80)
> ==12342== by 0x400857: main (hello.c:5)
> ==12342== Address 0x677697b is 107 bytes inside a block of size 256
> alloc'd
> ==12342== at 0x4C22A51: realloc (vg_replace_malloc.c:429)
> ==12342== by 0x53DCBE0: opal_dss_buffer_extend
> (dss_internal_functions.c:63)
> ==12342== by 0x53DE4BA: opal_dss_copy_payload (dss_load_unload.c:
> 164)
> ==12342== by 0x7C9F314: allgather (grpcomm_bad_module.c:363)
> ==12342== by 0x7C9FD9E: modex (grpcomm_bad_module.c:497)
> ==12342== by 0x4E6DCAF: ompi_mpi_init (ompi_mpi_init.c:626)
> ==12342== by 0x4EAAC88: PMPI_Init (pinit.c:80)
> ==12342== by 0x400857: main (hello.c:5)
> ==12342== Uninitialised value was created by a stack allocation
> ==12342== at 0x53FFA60: opal_ifinit (if.c:147)
> {
> <insert a suppression name here>
> Memcheck:Param
> writev(vector[...])
> fun:writev
> fun:mca_oob_tcp_msg_send_handler
> fun:mca_oob_tcp_peer_send
> fun:mca_oob_tcp_send_nb
> fun:orte_rml_oob_send
> fun:orte_rml_oob_send_buffer
> fun:allgather
> fun:modex
> fun:ompi_mpi_init
> fun:PMPI_Init
> fun:main
> }
> ==12342==
> ==12342== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 307
> from 3)
> ==12342== malloc/free: in use at exit: 204,012 bytes in 2,022 blocks.
> ==12342== malloc/free: 10,382 allocs, 8,360 frees, 14,603,162 bytes
> allocated.
> ==12342== For a detailed leak analysis, rerun with: --leak-check=yes
> ==12342== For counts of detected errors, rerun with: -v
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users