Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Valgrind writev() errors with 1.3.2.
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-06-09 15:22:47


It is not as simple as it sound. The problem is not coming from the
OOB, it just surface there. The header we add on the wire is well
aligned and completely initialized. The problem is coming from the
buffer that the OOB TCP is asked to send, buffer which is only
partially initialized. This buffer is not something that the OOB can
set, so the proposed approach will not work. Unfortunately, in order
to completely remove these false positives, all layer using OOB would
have to be scanned in order to make sure that they avoid sending
uninitialized data. This is way too much work, for a so little benefit.

As the user level is not supposed to use the OOB to send data, all
calls going from orte_rml_oob_send can be safely ignored by valgrind.
I'll advocate the usage of the following suppression rule with
valgrind. This will save a lot of output for the user, and save us
(ompi developers) a lot of time!

{
   ORTE OOB suppression rule
   Memcheck:Param
   writev(vector[...])
   fun:writev
   ...
   fun:orte_rml_oob_send
   ...
   fun:main
}

   george.

On Jun 9, 2009, at 11:01 , Ralph Castain wrote:

> I can't speak to all of the OMPI code, but I can certainly create a
> new configure option --valgrind-friendly that would initialize the
> OOB comm buffers and other RTE-related memory to eliminate such
> warnings.
>
> I would prefer to configure it out rather than adding a bunch of "if-
> then" checks for envars to avoid having the performance hit when not
> needed.
>
> Would that help?
>
> On Tue, Jun 9, 2009 at 11:40 AM, tom fogal <tfogal_at_[hidden]>
> wrote:
> jody <jody.xha_at_[hidden]> writes:
> > I made a suppression file for the irrelevant memory leaks of
> ompi: I
> > make no claim that it catches all possible ones, but it catches all
> > that appear in my code.
> [snip]
>
> Thanks, Jody.
>
> What are the chances something like this could be added / maintained
> in
> the OpenMPI tree? It would be great to have something 1) maintained
> by
> someone more knowledgeable about these errors than me, and 2)
> installed
> by default when I setup my toolchain for parallel debugging.
>
> > On Tue, Jun 9, 2009 at 3:28 PM, Jeff Squyres<jsquyres_at_[hidden]>
> wrote:
> > > This is worth adding to the FAQ.
> > >
> > > On Jun 9, 2009, at 2:31 AM, Ashley Pittman wrote:
> > >
> > >> On Mon, 2009-06-08 at 23:41 -0600, tom fogal wrote:
> > >> > George Bosilca <bosilca_at_[hidden]> writes:
> > >> > > There is a whole page on valgrind web page about this
> topic. Please
> > >> > > read
> > >> > > http://valgrind.org/docs/manual/manual-core.html#manual-core.suppress
> > >> > > for more information.
> > >> >
> > >> > Even better, Ralph (et al.) is if we could just make valgrind
> think
> > >> > this is defined memory. One can do this with client requests:
> > >> >
> > >> > http://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs
> > >>
> > >> Using the Valgrind client requests unnecessarily is a very bad
> idea,
> > >> they are intended for where applications use their own memory
> allocator
> > >> (i.e. replace malloc/free) or are using custom kernel modules or
> > >> hardware which Valgrind doesn't know about.
>
> Okay, sure, I realize it was a bit of an abuse of the intended use of
> the tool.
>
> > >> The correct solution is either to not send un-initialised memory
> > >> or to suppress the error using a suppression file as George
> > >> said. As the error is from MPI_Init() you can safely ignore it
> > >> from a end-user perspective.
>
> As I mentioned in my initial message, MPI_Init is only one such
> error; I get them in a lot of MPI calls, seemingly anything that does
> communication. Though I've heard differently on this list, this led
> me
> to believe I was doing something wrong in my code.
>
> It seems like the only way I could verify that I'm not causing these
> errors myself is to grok the call stacks I'm given for each vg error
> and figure out where the uninitialized memory comes from, and then
> make
> a judgement call for myself whether this makes sense to suppress. Or
> I could mail the list about every error I see and ask for confirmation
> that it's benign/suppressable. Most likely, I'll take the simple
> approach and just use the suppression file I was given, but that's
> prone to be fragile and break with a future OpenMPI release.
>
> What about an environment variable which enables slower,
> valgrind-friendly behavior? There's precedent in other libraries,
> e.g.
> glib [1].
>
> -tom
>
> [1] http://library.gnome.org/devel/glib/stable/glib-running.html
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users