In my code I implement mpi_send/mpi_receive for an three dimensional real
array, and process is as follows:
all other processors send the array to rank 0 and then rank 0 receives the
array and put these arrays into a complete array. Then mpi_bcast is called
to send the complete array from rank 0 to all others.
This is very basic usage of mpi_send and mpi_receive. In my fortran code I
found that if I added call mpi_barrier(...) before the mpi_send and
mpi_reive statements the wall time (60s) for this sending and receiving
will be much lower than that if mpi_barrier is not called (2s). I used
mpi_wtime to count the time.
I think mpi_send and mpi_recv are blocking subroutines and thus no
additional mpi_barrier is needed. Can anybody tell me what is the reason
for this phenomena? Thank you very much.