Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Debugging memory use of Open MPI
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-04-16 14:43:43

Eugene Loh wrote:

> Shaun Jackman wrote:
>> What's the purpose of the 400 MB that MPI_Init has allocated?
> It's for... um, I don't know. Let's see...
> About a third of it appears to be
> vt_open() -> VTThrd_open() -> VTGen_open
> which I'm guessing is due to the VampirTrace instrumentation (maybe
> allocating the buffers into which the MPI tracing data is collected).
> It seems to go away if one doesn't collect message-tracing data.
> Somehow, I can't see further into the library. Hmm. It does seem
> like a bunch. The shared-memory area (which MPI_Init allocates for
> on-node message passing) is much smaller. The remaining roughly 130
> Mbyte/process seems to be independent of the number of processes.
> An interesting exercise for the reader.

Arrgh. What a pathetic response! Lemme see if I can do better than that.

As I said, about a "third" (whatever that means) is for vt_open(), and
I'm pretty sure that's for the VampirTrace message tracing. If we don't
collect message traces, that memory isn't allocated.

What's the rest? I said the shared-memory area is much smaller, but I
was confused about which OMPI release I was using. So, the
shared-memory area was 128 Mbyte and it was getting mapped in once for
each process, and so it was counted doubly.

Plus, even a "hello world" program seems to have some inexplicably large
amount of memory (10-20 Mbytes?).


- about 10-20 Mbytes just to start the simplest program up
- other miscellaneous MPI stuff
- 128 Mbyte for the shared-memory area, counted twice
- about 150 Mbyte for VT buffers

Now, another question you might have is why the shared-memory area is so
big. The idea is that processes communicate via shared memory by having
one process write to the shared area and the other read from it. It can
be advantageous to provide ample room (e.g., to minimize synchronization
among processes... otherwise, processes end up having to wait for
congested resources to clear or to do extra work to avoid the
congestion). "Ample" room means ample for lots of data and/or for lots
of (short) messages. How much is enough? No idea. YMMV. The more the
better. Etc. Someone picked some numbers and that's what you live with
by default. So, why so big? Answer: just because we picked it to be
that way.