>I wanted to profile my application using gprof, and proceeded like
>when profiling a normal application:
>- compile everything with option -pg
>- run application
>- call gprof
>This returns a normal-looking output, but i don't know
>whether this is the data for node 0 only or accumulated for all nodes.
>Does anybody have experience in profiling parallel applications?
>Is there a way to have profile data for each node separately?
>If not, is there another profiling tool which can?
Gosh, I'm trying not to sound like a repeating commercial, but this is a
rather direct answer to your question.
If you use Sun Studio compilers and tools, there is a Performance
Analyzer. The basic mode of operation is that it samples the callstack
periodically. So, you don't get the huge data volumes that tracing
tools generate, but you do get statistically fair data that shows where
time is spent. If you preface your "mpirun" command with "collect",
then you get data for all the MPI processes in your job. You can look
at data aggregated over all processes or for some subset. You can get
gprof-style information about where time is spent. You can also trace
MPI calls, the memory heap, hardware events (like cache misses), etc.
Tool is available from http://developers.sun.com/sunstudio/ via free
download for Linux and Solaris on on x86 and SPARC. You don't need to
compile your program specially (I mean, no -pg). Fine print applies to
every statement I'm making in this paragraph, but I'm trying to keep it
Again, sorry if it sounds like a commercial, but it's intended to be a
direct answer to your question.
P.S. If you go to
"halfway down" is a set of presentations on "How to Perform Analysis".
This can give you more information on Performance Analyzer. I don't
know how much, if any, is specific to MPI, but should be helpful.