I am parallelizing a CFD 2D code in
that the grid (all triangles) is partitioned among 8 processes using
Each process has different number of neighboring processes. Suppose
process has n elements/faces whose data it needs to sends to
neighboring processes, and it has m number of elements/faces on which
to get data from corresponding neighboring processes. Values of n and m
different for each process. Another aim is to hide the communication
computation. For this I do the following for each process:
But this is not
working in current code; and the previous code was not giving correct
with large number of processes.
I don't know how literally to read the code you sent. Maybe your
actual code "does the right thing", but just to confirm I think the
correct code should look like this:
DO J=1, N
DO K=1, M
That is, you start all non-blocking sends. Then you perform receives.
Then you complete the sends. More commonly, one would post all
receives first using non-blocking calls (MPI_IRECV), then perform all
sends (MPI_SEND), then complete the receives with MPI_WAITALL.
Yet another option is to post non-blocking receives, then non-blocking
sends, then complete all sends and receives with a WAITALL call that
has M+N requests.
Sorry if you already knew all this and I'm just overreacting to the
simplified code you sent out.