Thanks a lot for posting my question. I am using Vampire Trace (VT) library for performance diagnosis and enhancement. An excerpted output of the code profiling using VT looks like the following table where the bottleneck process is named 'sync' (in the first row). I did not know or use a function 'sync' in my MPI code. Can someone tell me how 'sync' enters the context of an MPI code and how one may do to mitigate the related bottleneck problem? Thanks so much for help.
excl. time incl. time
*excl. time incl. time calls / call / call name
23.710s 23.710s 2 11.855s 11.855s sync
3.403s 3.403s 2 1.701s 1.701s MPI_Barrier
3.240s 3.269s 1 3.240s 3.269s MPI_Init
1.966s 1.966s 42 46.812ms 46.812ms MPI_Waitall
0.680s 0.680s 10.3125 65.932ms 65.932ms MPI_Wait
0.639s 0.639s 2 0.320s 0.320s MPI_Allreduce
0.403s 0.403s 806017.625 0.499us 0.499us _ZN14CellTetLaminar6ddendzEi
0.359s 0.359s 1134916.62 0.315us 0.315us _2__STRING.244
NOTICE - This communication may contain confidential and privileged information that is for the sole use of the intended recipient. Any viewing, copying or distribution of, or reliance on this message by unintended recipients is strictly prohibited. If you have received this message in error, please notify us immediately by replying to the message and deleting it from your computer.