Additional testing seems to show that the problem is related to barriers and how often they poll to determine whether or not it’s time to leave. Is there some MCA parameter or environment variable that allows me to control the frequency of polling while in barriers?
OpenMPI version: 1.4.3
Platform: IBM P5, 32 processors, 256 GB memory, Symmetric Multi-Threading (SMT) enabled
Application: starts up 48 processes and does MPI using MPI_Barrier, MPI_Get, MPI_Put (lots of transfers, large amounts of data)
Issue: When implemented using Open MPI vs. IBM’s MPI (‘poe’ from HPC Toolkit), the application runs 3-5 times slower.
I suspect that IBM’s MPI implementation must take advantage of some knowledge that it has about data transfers that Open MPI is not taking advantage of.