Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] memchecker overhead?
From: Rainer Keller (keller_at_[hidden])
Date: 2009-10-26 16:36:32

Hi Brock,
On Monday 26 October 2009 03:23:42 pm Brock Palen wrote:
> Is there a large overhead for --enable-debug --enable-memchecker?
> reading:
> It sounds like there is and there isn't, what should I expect if we
> build all of our mpi libraries with those options, when we run normally:
> mpirun ./myexe
> vs using a library that was not built with those options?
This may be too verbose an answer ,-)

Now while --enable-debug adds quite a bit of overhead, due to various internal
runtime checks being introduced into code-path (e.g. for every opal-object
checks of a magic-id, whether this really is a proper object, checking the
reference-counter and keeping the src-file and line-number of the
How "bad" --enable-debug is really depends on Your communication pattern and
the setup, e.g. shared memory communication latency suffers most.

To make usage of memchecker and the best of valgrind, You don't actually need
--enable-debug, depending on Your setup:
 - For user-apps debugging (checking, whether buffers given to MPI are
initialized, whether data returned by MPI may be accessed, etc.)
The user app of course should be compiled with debugging on ("-g").

 - To get valgrind-output of OMPI-internal data-structures including source-
location of undefined memory of You'd want to compile OMPI with --enable-debug
(or at least with -g and without optimization) and furthermore define
OMPI_WANT_MEMCHECKER_MPI_OBJECTS in ompi/include/ompi/memchecker to check the
initialization of OMPI's MPI_Comm/datatypes and others. This however is mostly
for OMPI-developers..

Per overhead:
- The latency of running an application with libmpi compiled with memchecker
when _not_ running under valgrind (3-6% over IB-DDR using IMB), while
bandwidth is hardly influenced.
- When doing the OMPI-internal MPI-object checking, it _does_ become very
costly due to the many client-requests issued using valgrind's API (but as
noted this is for OMPI-developers, anyway).
Please see for more information.

With the NPB benchmark, we did not find any performance implications with the
instrumentation added when not run under valgrind.

Now when running the application under valgrind, the expected slow-down of the
valgrind's memcheck come into effect...

So, the most flexible way is to provide two versions and let users decide per
modulefile with a verbose proc ModulesHelp...

With best regards,

Rainer Keller, PhD                  Tel: +1 (865) 241-6293
Oak Ridge National Lab          Fax: +1 (865) 241-4811
PO Box 2008 MS 6164           Email: keller_at_[hidden]
Oak Ridge, TN 37831-2008    AIM/Skype: rusraink