Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] curious behavior during wait for broadcast: 100% cpu
From: Eugene Loh (eugene.loh_at_[hidden])
Date: 2010-12-14 11:50:07


David Mathog wrote:

> Is there a tool in openmpi that will reveal how much "spin time" the
> processes are using?

I don't know what sort of answer is helpful for you, but I'll describe
one option.

With Oracle Message Passing Toolkit (formerly Sun ClusterTools, anyhow,
an OMPI distribution from Oracle/Sun) and Oracle Solaris Studio
Performance Analyzer (formerly Sun Studio Performance Analyzer) you can
see how much time is spent in MPI work, MPI wait, and so on.
Specifically, by process, you could see (I'm making an example up) that
process 2 spent:
* 35% of its time in application-level computation
* 5% of its time in MPI moving data
* 60% of its time in MPI waiting
but process 7 spent:
* 90% of its time in application-level computation
* 5% of its time in MPI moving data
* only 5% of its time in MPI waiting
That is, beyond the usual profiling support you might find in other
tools, with Performance Analyzer you can distinguish time spent in MPI
moving data from time spent in MPI waiting.

On the other hand, you perhaps don't need that much detail. For your
purposes, it may suffice just to know how much time each process is
spending in MPI. There are various profiling tools that will give you
that. See http://www.open-mpi.org/faq/?category=perftools Load
balancing is a common problem people investigate with such tools.

Finally, if you want to stick to tools like top, maybe another
alternative is to get your application to go into sleep waits. I can't
say this is the best choice, but it could be fun/interesting. Let's say
your application only calls a handful of different MPI functions. Write
PMPI wrappers for them that convert blocking functions
(MPI_Send/MPI_Recv) to non-blocking ones mixed with short sleep calls.
Not pretty, but might just be doable for your case. I don't know.
Anyhow, that might make MPI wait time detectable with tools like top.