Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Heap profiling with OpenMPI
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-08-07 08:10:22

On Aug 7, 2008, at 3:20 AM, Jan Ploski wrote:


> Thanks for this explanation. According to what you wrote,
> --without-memory-manager can make my and other applications run
> significantly slower.

*If* they continually re-use the same buffers for large messages and
you're using an OpenFabrics network (or older Myrinet/GM). Then yes,
your peak bandwidth with be a bit lower. We use a pipelined protocol
to offset the total impact, so it's not a total disaster -- but there
is definitely some loss off the total bandwidth you'll see for large
messages. Small message performance is not impacted because those use
pre-registered buffers.

> While I can find out just how much for my app, I
> hardly can do it for other (unknown) users. So it would be nice if
> my heap
> profiling problem could be resolved in another way in the future. Is
> the
> planned mpi_leave_pinned change in v1.3 going to correct it?

Here's what we're doing for v1.3 (it's an open ticket to document this
before v1.3 is released):

1. leave_pinned will remain off by default on networks that don't care
about it (tcp, sm, ...etc.)

2. for openfabrics networks (i.e., the openib btl), leave_pinned will
automatically enable itself if:
     a) one of the following two is true
        - you added -lopenmpi-malloc to the link step when creating
your MPI app
        - or, if libopenmpi-malloc is not present, if a mallopt() hint
[that we try by default] was able to assert itself that tells the
allocator to never return memory to the OS
     b) and the user did not manually specific leave_pinned=0

libopenmpi-malloc is the same ptmalloc2 library that we used in the
v1.2 series. We can detect if it is there at run-time; if it is not,
we try the mallopt hint.

So by default, you shouldn't need to specify leave_pinned=1 anymore
for the openib btl -- you should get the same end result as if you had
specified it in the v1.2 series (better bandwidth for large
messages). But if you want exactly the same behavior as v1.2, you'll
need to add -lompi-malloc to your link line. We anticipate that most
users won't notice the difference.

The reason we separated out the ptmalloc2 library is that it
definitely causes problems with some applications (e.g., valgrind
profiling), and it penalizes networks that don't need a memory
allocator (e.g., tcp, sm, ...etc.).

Does that make sense?

If you'd like to give it a whirl, the trunk nightly tarballs are
fairly stable at this point -- except for the sm btl. There's a bug
in how the sm btl interacts with our message passing engine (ob1) that
George swears he will have fixed by tomorrow. :-) But if you avoid
using the sm btl, you should be fine.

Trunk and v1.3 snapshot tarballs are pretty close at this point; we're
mainly applying bug fixes to the trunk and moving them over to the
v1.3 after they've "soaked" on the trunk for a few days. Soon enough,
however, the trunk will likely start diverging from the v1.3 branch as
we move on towards v1.4.

Jeff Squyres
Cisco Systems