How fast/well are MPI collectives implemented in ompi?
I'm running the Intel MPI 1.1. benchmarks and seeing the need to set
wall clock times > 12 hours for run sizes of 200 and 300 nodes for 1ppn
and 2ppn cases. The collective tests that usually pass in 2ppn cases:
Barrier, Reduce scatter, allreduce, bcast
The ones that take long or never run:
Allgather, alltoall, allgatherv