OpenMPI version: 1.4.3

Platform: IBM P5, 32 processors, 256 GB memory, Symmetric Multi-Threading (SMT) enabled

Application: starts up 48 processes and does MPI using MPI_Barrier, MPI_Get, MPI_Put (lots of transfers, large amounts of data)

Issue:  When implemented using Open MPI vs. IBM’s MPI (‘poe’ from HPC Toolkit), the application runs 3-5 times slower.

I suspect that IBM’s MPI implementation must take advantage of some knowledge that it has about data transfers that Open MPI is not taking advantage of.

Any suggestions?

Thanks,

Brian Price