Dear E. Loh.



Another is whether you can overlap communications and computation.  This does not require persistent channels, but only nonblocking communications (MPI_Isend/MPI_Irecv).  Again, there are no MPI guarantees here, so you may have to break your computation up and insert MPI_Test calls.

You may want to get the basic functionality working first and then run performance experiments to decide whether these really are areas that warrant such optimizations.

         CALL MPI_STARTALL
         -------perform work that could be done with local data ---------------- (A)
         CALL MPI_WAITALL( )
         -------perform work using the received data  --------------- (B)


In the above I have broken up the computation. In (A) I perform the work that could be done with local data. When the recevied data is must for remaining computations I put WAITALL  to ensure that data data from the neighbouring processes has received. I am fine with MPI_IRECV and ISEND, i.e.,

         CALL MPI_IRECV()
         CALL MPI_ISEND()
         -------perform work that could be done with local data ---------------- (A)
         CALL MPI_WAITALL( )
         -------perform work using the received data  --------------- (B)



But I am doubtful whether I am getting computation-communication overlap to save time.or I am getting the the same performance as could be obtained by,

         CALL MPI_IRECV()
         CALL MPI_ISEND()
         CALL MPI_WAITALL( )
         -------perform work that could be done with local data ---------------- (A)
         -------perform work using the received data  --------------- (B)


In this case (equivalent to blocking communication), I observed that only around 5% more time it takes.

And the SECOND desire is to use Persistent communication for even better speedup.


best regards,
AA.