Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Debugging memory use of Open MPI
From: Shaun Jackman (sjackman_at_[hidden])
Date: 2009-04-14 14:02:21

Hi Eugene,

Eugene Loh wrote:
> At 2500 bytes, all messages will presumably be sent "eagerly" -- without
> waiting for the receiver to indicate that it's ready to receive that
> particular message. This would suggest congestion, if any, is on the
> receiver side. Some kind of congestion could, I suppose, still occur
> and back up on the sender side.

Can anyone chime in as to what the message size limit is for an
`eager' transmission?

> On the other hand, I assume the memory imbalance we're talking about is
> rather severe. Much more than 2500 bytes to be noticeable, I would
> think. Is that really the situation you're imagining?

The memory imbalance is drastic. I'm expecting 2 GB of memory use per
process. The heaving processes (13/16) use the expected amount of
memory; the remainder (3/16) misbehaving processes use more than twice
as much memory. The specifics vary from run to run of course. So, yes,
there is gigs of unexpected memory use to track down.

> There are tracing tools to look at this sort of thing. The only one I
> have much familiarity with is Sun Studio / Sun HPC ClusterTools. Free
> download, available on Solaris or Linux, SPARC or x64, plays with OMPI.
> You can see a timeline with message lines on it to give you an idea if
> messages are being received/completed long after they were sent.
> Another interesting view is constructing a plot vs time of how many
> messages are in-flight at any moment (including as a function of
> receiver). Lots of similar tools out there... VampirTrace (tracing side
> only, need to analyze the data), Jumpshot, etc. Again, though, there's
> a question in my mind if you're really backing up 1000s or more of
> messages. (I'm assuming the memory imbalances are at least Mbytes.)

I'll check out Sun HPC ClusterTools. Thanks for the tip.

Assuming the problem is congestion and that messages are backing up,
is there an accepted method of dealing with this situation? It seems
to me the general approach would be

if (number of outstanding messages > high water mark)
     wait until (number of outstanding messages < low water mark)

where I suppose the `number of outstanding messages' is defined as the
number of messages that have been sent and not yet received by the
other side. Is there a way to get this number from MPI without having
to code it at the application level?