OpenMPI version: 1.4.3
Platform: IBM P5, 32 processors, 256 GB memory, Symmetric Multi-Threading (SMT) enabled
Application: starts up 48 processes and does MPI using MPI_Barrier, MPI_Get, MPI_Put (lots of transfers, large amounts of data)
Issue: When implemented using Open MPI vs. IBM’s MPI (‘poe’ from HPC Toolkit), the application runs 3-5 times slower.
I suspect that IBM’s MPI implementation must take advantage of some knowledge that it has about data transfers that Open MPI is not taking advantage of.