I am parallelizing a CFD 2D code in
FORTRAN+OPENMPI. Suppose
that the grid (all triangles) is partitioned among 8 processes using
METIS.
Each process has different number of neighboring processes. Suppose
each
process has n elements/faces whose data it needs to sends to
corresponding
neighboring processes, and it has m number of elements/faces on which
it needs
to get data from corresponding neighboring processes. Values of n and m
are
different for each process. Another aim is to hide the communication
behind
computation. For this I do the following for each process:
This solves my
problem. But it gives memory leakage; Ram gets filled after few
thousands of
iteration. What is the solution/remedy? How should I tackle this?
In another CFD code
I removed this problem of memory-filling by following (in that code
n=m) :
But this is not
working in current code; and the previous code was not giving correct
results
with large number of processes.
I don't know how literally to read the code you sent. Maybe your
actual code "does the right thing", but just to confirm I think the
correct code should look like this:
DO J=1, N
CALL MPI_ISEND(...)
END DO
DO K=1, M
CALL MPI_RECV(...)
END DO
CALL MPI_WAITALL(...)
That is, you start all non-blocking sends. Then you perform receives.
Then you complete the sends. More commonly, one would post all
receives first using non-blocking calls (MPI_IRECV), then perform all
sends (MPI_SEND), then complete the receives with MPI_WAITALL.
Yet another option is to post non-blocking receives, then non-blocking
sends, then complete all sends and receives with a WAITALL call that
has M+N requests.
Sorry if you already knew all this and I'm just overreacting to the
simplified code you sent out.