|
|
amjad ali wrote:
You would break the MPI_Irecv
and MPI_Isend calls up into two parts:
MPI_Send_init and MPI_Recv_init in the first part and MPI_Start[all] in
the second part. The first part needs to be moved out of the
subroutine... at least outside of the loop in sub1() and maybe even
outside the 10000-iteration loop in the main program. (There would
also be MPI_Request_free calls that would similarly have to be moved
out.) If the overheads are small compared to the other work you're
doing per message, the savings would be small. (And, I'm guessing this
is the case for you.) Further, the code refactoring might not be
simple. So, persistent communications *might* not be a fruitful
optimization strategy for you. Just a warning.
Well! If I follow this strategy then the picture should be as follows.
Correct??
Yes, I think that's right.
Obviously the sub1 and sub2 exists outside separately.
Following is just for understanding.
Main program
starts------@@@@@@@@@@@@@@@@@@@@@@@.
CALL MPI_RECV_INIT for each
neighboring process
CALL MPI_SEND_INIT for each neighboring process
Loop Calling the
subroutine1--------------------(10000 times in the main program).
Call subroutine1
Subroutine1
starts===================================
Loop A starts here
>>>>>>>>>>>>>>>>>>>>
(three passes)
Call subroutine2
Subroutine2 starts----------------------------
Pick local data from
array U in separate arrays for each neighboring processor
CALL MPI_STARTALL
-------perform work that could
be done with local data
CALL MPI_WAITALL( )
-------perform work
using the received data
Subroutine2 ends----------------------------
-------perform work to update array U
Loop A ends here
>>>>>>>>>>>>>>>>>>>>
Subroutine1
ends====================================
Loop Calling the subroutine1
ends------------(10000 times in the main program).
CALL MPI_Request_free( )
Main program
ends------@@@@@@@@@@@@@@@@@@@@@@@.
But I think in the above case sending and receiving buffers would need
to be create in GLOBAL Module , or need to be passed in the subroutine
headers.
Right. The buffer information is needed both outside of all the loops
(in MAIN, where the persistent channels are created) and in the
innermost loop (in subroutine 2, where the buffers are loaded and used).
In above there is one confusion. The sending buffer will
be present in the argument list of the MPI_SEND_INIT() but it will get
the values to be sent in the sub2? Is it possible/correct?
Yes. The buffer needs to be used by the user program to set the send
message up and to use the data that has been received. The buffer also
needs to be specified to the MPI implementation so that MPI knows which
buffers to send/receive. With a persistent communication, you specify
the buffer in the "init" call and thereafter refer to it opaquely with
the "request" handle. Incidentally, this can cause problems for
optimizing compilers, which may not recognize there is a relationship
between a buffer and the opaque request handle. Consider the "extreme
possibility" described in
http://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node236.htm#Node241
The question is that, will above actually be communication
efficient and over-lapping communication-computation.
There are two issues, I think.
One is whether persistent communications will help you reduce
overheads. It depends, but if for each message you do a bunch of work
(packing buffers, computing on data, or even just having lost of data
per message), then the amount of overhead you're saving may be
relatively small.
Another is whether you can overlap communications and computation.
This does not require persistent channels, but only nonblocking
communications (MPI_Isend/MPI_Irecv). Again, there are no MPI
guarantees here, so you may have to break your computation up and
insert MPI_Test calls.
You may want to get the basic functionality working first and then run
performance experiments to decide whether these really are areas that
warrant such optimizations.
|
|
|