On Dec 10, 2009, at 5:01 PM, Gus Correa wrote:
> A couple of questions to the OpenMPI pros:
> If shared memory ("sm") is turned off on a standalone computer,
> which mechanism is used for MPI communication?
> TCP via loopback port? Other?
Whatever device supports node-local loopback. TCP is one; some OpenFabrics devices do, too.
> Why wouldn't shared memory work right on Nehalem?
> (That is probably distressing for Mark, Matthew, and other Nehalem owners.)
To be clear, we don't know that this is a Nehalem-specific problem. We actually thought it was an AMD-specific problem, but these results are interesting. We've had a notoriously difficult time reproducing the problem reliably, which is why it hasn't been fixed yet. :-(
The best luck so far in reproducing the problem has been with GCC 4.4.x (at Sun). I've been trying for a few days to install GCC 4.4 on my machines without much luck yet. Still working on it...