See P. 38 - 40, MVAPICH2 outperforms Open-MPI for each test, so is it
something that they are doing to optimize for CUDA & GPUs and those
optimizations are not in OMPI, or did they specifically tune MVAPICH2
to make it shine??
The benchmark package: http://mvapich.cse.ohio-state.edu/benchmarks/
Open Grid Scheduler / Grid Engine
Scalable Grid Engine Support Program