> [...]
>
> MPICH2 manages to get about 5GB/s in shared memory performance on the
> Xeon 5420 system.
Does the sm btl use a memcpy with non-temporal stores like MPICH2?
This can be a big win for bandwidth benchmarks that don't actually
touch their receive buffers at all...
-Ron
|