On 11/7/06, Chevchenkovic Chevchenkovic <chevchenkovic_at_[hidden]> wrote:
> I had the following setup:
> Rank 0 process on node 1 wants to send an array of particular size to Rank
> 1 process on same node.
> 1. What are the optimisations that can be done/invoked while running mpirun
> to perform this memory to memory transfer efficiently?
> 2. Is there any performance gain if 2 processes that are exchanging data
> arrays are kept on the same node rather than on different nodes connected by
if your aplication is on one given node, sharing data is better than
You can do this with unix shared memory api, or with posix threads api.
If aplications share the same address space, and if copy is necessary,
memcpy() is probably the faster way (and ensuring that data is aligned
However, this by definition does not work on multi-computer
If you can have:
1 aplication per node, several threads per node.
consider using MPI only between aplications, and setup your MPI
framework to launch one aplication per node.
program your aplication to use #threads per rank (node), and use posix
threading model for parallel execution in each node (for instance,
where #threads == NCPUS) , and use MPI for comunicating between nodes.
the MPI model assumes you don't have a "shared memory" system..
therefore it is "message passing" oriented, and not designed to
perform optimally on shared memory systems (like SMPs, or numa-CCs).
Miguel Sousa Filipe