Dear E. Loh.
Another is whether you can overlap communications and computation. This does not require persistent channels, but only nonblocking communications (MPI_Isend/MPI_Irecv). Again, there are no MPI guarantees here, so you may have to break your computation up and insert MPI_Test calls.
You may want to get the basic functionality working first and then run performance experiments to decide whether these really are areas that warrant such optimizations.