>
> You would break the MPI_Irecv and MPI_Isend calls up into two parts:
> MPI_Send_init and MPI_Recv_init in the first part and MPI_Start[all] in the
> second part. The first part needs to be moved out of the subroutine... at
> least outside of the loop in sub1() and maybe even outside the
> 10000-iteration loop in the main program. (There would also be
> MPI_Request_free calls that would similarly have to be moved out.) If the
> overheads are small compared to the other work you're doing per message, the
> savings would be small. (And, I'm guessing this is the case for you.)
> Further, the code refactoring might not be simple. So, persistent
> communications *might* not be a fruitful optimization strategy for you.
> Just a warning.
>
Well! If I follow this strategy then the picture should be as follows.
Correct??
Obviously the sub1 and sub2 exists outside separately. Following is just for
understanding.
*
**Main program starts------@@@@@@@@@@@@@@@@@@@@@@@.*
*
**CALL MPI_RECV_INIT for each neighboring process
CALL MPI_SEND_INIT for each neighboring process*
*Loop Calling the subroutine1--------------------(10000 times in the main
program).
** Call subroutine1*
*
**Subroutine1 starts===================================*
* Loop A starts here >>>>>>>>>>>>>>>>>>>> (three passes)
Call subroutine2
Subroutine2 starts----------------------------
Pick local data from array U in separate arrays for each
neighboring processor
CALL MPI_STARTALL
-------perform work that could be done with local data
CALL MPI_WAITALL( )
-------perform work using the received data
Subroutine**2** ends**----------------------------*
* -------perform work to update array U*
* Loop A ends here >>>>>>>>>>>>>>>>>>>>*
*Subroutine1 ends====================================*
*Loop Calling the subroutine1 ends------------(10000 times in the main
program).*
*CALL MPI_Request_free( )*
*Main program ends------@@@@@@@@@@@@@@@@@@@@@@@.*
But I think in the above case sending and receiving buffers would need to be
create in GLOBAL Module , or need to be passed in the subroutine headers. In
above there is one confusion. The sending buffer will be present in the
argument list of the MPI_SEND_INIT() but it will get the values to be sent
in the sub2? Is it possible/correct?
The question is that, will above actually be communication efficient and
over-lapping communication-computation.
best regards,
AA
|