I had the following setup:
Rank 0 process on node 1 wants to send an array of particular size to Rank
1 process on same node.
1. What are the optimisations that can be done/invoked while running mpirun
to perform this memory to memory transfer efficiently?
2. Is there any performance gain if 2 processes that are exchanging data
arrays are kept on the same node rather than on different nodes connected by
Awaiting a reply,