On Fri, 5 Feb 2010 14:28:40 -0600, Barry Smith <bsmith_at_[hidden]> wrote:
> To cheer you up, when I run with openMPI it runs forever sucking down
> 100% CPU trying to send the messages :-)
On my test box (x86 with 8GB memory), Open MPI (1.4.1) does complete
after several seconds, but still prints the wrong count.
MPICH2 does not actually send the message, as you can see by running the
# Open MPI 1.4.1, correct cols
count -103432106, cols 0
# MPICH2 1.2.1, incorrect cols
 count -103432106, cols 1
How much memory does crush have (you need about 7GB to do this without
swapping)? In particular, most of the time it took Open MPI to send the
message (with your source) was actually just spent faulting the
send/recv buffers. The attached faults the buffers first, and the
subsequent send/recv takes less than 2 seconds.
Actually, it's clear that MPICH2 never touches either buffer because it
returns immediately regardless of whether they have been faulted first.
- text/x-csrc attachment: stored