You would break the MPI_Irecv and MPI_Isend calls up into two parts: MPI_Send_init and MPI_Recv_init in the first part and MPI_Start[all] in the second part. The first part needs to be moved out of the subroutine... at least outside of the loop in sub1() and maybe even outside the 10000-iteration loop in the main program. (There would also be MPI_Request_free calls that would similarly have to be moved out.) If the overheads are small compared to the other work you're doing per message, the savings would be small. (And, I'm guessing this is the case for you.) Further, the code refactoring might not be simple. So, persistent communications *might* not be a fruitful optimization strategy for you. Just a warning.