The process 0 is the manager (gathers the results only),
processes 1 and 2 are workers (compute).
This is the case processes 1 and 2 are on different nodes (runs
in 162s).
@--- MPI Time (seconds)
---------------------------------------------------
Task AppTime MPITime MPI%
0 162 162 99.99
1 162 30.2 18.66
2 162 14.7 9.04
* 486 207 42.56
The case when processes 1 and 2 are on the same node (runs in
260s).
@--- MPI Time (seconds)
---------------------------------------------------
Task AppTime MPITime MPI%
0 260 260 99.99
1 260 39.7 15.29
2 260 26.4 10.17
* 779 326 41.82
I think there's a contention problem on the memory bus.