I'm just in the progress of moving our application from LAM/MPI to
OpenMPI, mainly because OpenMPI makes it easier for a user to run
multiple jobs(MPI universa) simultaneously. This is useful if a user
wants to run smaller experiments without disturbing a large experiment
running in the background). I've been evaluation the performance using
a simple test, running on a hetrogenous cluster of 2 x dual core
Opteron machines, a couple of dual core P4 Xeon machines and a 8 core
Core2 machine. The main structure of the application is a master rank
distributing jobs packages to the rest of the ranks and collecting the
results. We don't use any fancy MPI features but rather see it as an
efficient low-level tool for broadcasting and transferring data.
When a single user runs a job (fully subscribed nodes, but not
oversubscribed, i.e one process per cpu-core) on an otherwise unloaded
cluster both LAM/MPI and OpenMPI average runtimes of about 1m33s
(OpenMPI has a slightly lower average).
When I start the same job simultaneously as two different users (thus
oversubscribing the nodes 2x) under LAM/MPI, the two jobs finish as an
average time of about 3m, thus scaling very well (we use the -ssi rpi
sysv option to mpirun under LAM/MPI to avoid busy waiting).
When running the same second experiment under OpenMPI, the average
runtime jumps up to about 3m30s, with runs occasionally taking more
than 4 minutes to complete. I do use the "--mca mpi_yield_when_idle 1"
option to mpirun, but it doesn't seem to make any difference. I've
also tried setting the environment variable
OMPI_MCA_mpi_yield_when_idle=1, but still no change. ompi_info says:
ompi_info --param all all | grep yield
MCA mpi: parameter "mpi_yield_when_idle" (current value: "1")
The cluster is used for various tasks, running MPI applications as
well as non-MPI applications, so we would like to avoid spending too
much cycles on busy-waiting. Any ideas on how to tweak OpenMPI to get
better performance and more cooperative behavior in this case would be