I wanted to profile my application using gprof, and proceeded like
when profiling a normal application:
- compile everything with option -pg
- run application
- call gprof
This returns a normal-looking output, but i don't know
whether this is the data for node 0 only or accumulated for all nodes.
Does anybody have experience in profiling parallel applications?
Is there a way to have profile data for each node separately?
If not, is there another profiling tool which can?