Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Debugging memory use of Open MPI
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-04-14 16:54:11


Shaun Jackman wrote:

> Eugene Loh wrote:
>
>>>> On the other hand, I assume the memory imbalance we're talking
>>>> about is rather severe. Much more than 2500 bytes to be
>>>> noticeable, I would think. Is that really the situation you're
>>>> imagining?
>>>
>>> The memory imbalance is drastic. I'm expecting 2 GB of memory use
>>> per process. The heaving processes (13/16) use the expected amount
>>> of memory; the remainder (3/16) misbehaving processes use more than
>>> twice as much memory. The specifics vary from run to run of course.
>>> So, yes, there is gigs of unexpected memory use to track down.
>>
>> Umm, how big of a message imbalance do you think you might have?
>> (The inflection in my voice doesn't come out well in e-mail.)
>> Anyhow, that sounds like, um, "lots" of 2500-byte messages.
>
> The message imbalance could be very large. Each process is running
> pretty close to its memory capacity. If a backlog of messages causes a
> buffer to grow to the point where the process starts swapping, it will
> very quickly fall very far behind. There are some billion 25-byte
> operations being sent in total or tens of millions MPI_Send messages
> (at 100 operations per MPI_Send).

Okay. Attached is a "little" note I wrote up illustrating memory
profiling with Sun tools. (It's "big" because I ended up including a
few screenshots.) The program has a bunch of one-way message traffic
and some user-code memory allocation. I then rerun with the receiver
sleeping before jumping into action. The messages back up and OMPI ends
up allocating a bunch of memory. The tools show you who (user or OMPI)
is allocating how much memory and how big of a message backlog develops
and how the sender starts stalling out (which is a good thing!).
Anyhow, a useful exercise for me and hopefully helpful for you.