Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Debugging memory use of Open MPI
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-04-14 19:10:19


Shaun Jackman wrote:

> Wow. Thanks, Eugene. I definitely have to look into the Sun HPC
> ClusterTools. It looks as though it could be very informative.

Great. And, I didn't mean to slight TotalView. I'm just not familiar
with it.

> What's the purpose of the 400 MB that MPI_Init has allocated?

It's for... um, I don't know. Let's see...

About a third of it appears to be
vt_open() -> VTThrd_open() -> VTGen_open
which I'm guessing is due to the VampirTrace instrumentation (maybe
allocating the buffers into which the MPI tracing data is collected).
It seems to go away if one doesn't collect message-tracing data.

Somehow, I can't see further into the library. Hmm. It does seem like
a bunch. The shared-memory area (which MPI_Init allocates for on-node
message passing) is much smaller. The remaining roughly 130
Mbyte/process seems to be independent of the number of processes.

An interesting exercise for the reader.

> The figure of in-flight messages vs time when the receiver sleeps is
> particularly interesting. The sender appears to stop sending and block
> once there are 30'000 in-flight messages. Has Open MPI detected the
> situation of congestion and begun waiting for the receiver to catch
> up? Or is it something simpler, such as the underlying write(2) call
> to the TCP socket blocking? If it's the first case, perhaps I could
> tune this threshold to behave better for my application.

This particular case is for two on-node processes. So, no TCP is
involved. There appear to be about 55K allocations, which looks like
the 85K peak minus the 30K at which the sender stalls. So, maybe some
resource got exhausted at that point. Dunno.

Anyhow, this may be starting to get into more detail than you (or I)
need to understand to address your problem. It *is* interesting stuff,
though.