I am studying the optimization strategy when the number of communication
functions in a code
My courses on MPI say two things for optimization which are contradictory :
1*) You have to use temporary message copy to allow non-blocking sending
and uncouple the sending and receiving
2*) Avoid using temporary message copy because the copy will add extra cost
on execution time.
And then, we are adviced to do :
- replace MPI_SEND by MPI_SSEND (synchroneous blocking sending) : it is
said that execution is divided by a factor 2
- use MPI_ISSEND and MPI_IRECV with MPI_WAIT function to synchronize
(synchroneous non-blocking sending) : it is said that execution is divided
by a factor 3
So what's the best optimization ? Do we have to use temporary message copy
or not and if yes, what's the case for ?