H
the MPI model assumes you don't have a "shared memory" system..
therefore it is "message passing" oriented, and not designed to
perform optimally on shared memory systems (like SMPs, or numa-CCs).
For many programs with both MPI and shared memory implementations, the
MPI version runs faster on SMPs and numa-CCs. Why? See the previous
paragraph...
Of course it does..its faster to copy data in main memory than it is
to do it thought any kind of network interface. You can optimize you
message passing implementation to a couple of memory to memory copies
when ranks are on the same node. In the worst case, even if using
local IP addresses to communicate between peers/ranks (in the same
node), the operating system doesn't even touch the interface.. it
will just copy data from a tcp sender buffer to a tcp receiver
buffer.. in the end - that's always faster than going through a
phisical network link.
There are a lot of papers about the relative merits of a mixed shared
memory and