Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Oversubscription performance problem
From: Lars Andersson (larsand_at_[hidden])
Date: 2008-04-09 03:10:58

On Fri, Apr 4, 2008 at 4:30 PM, Lars Andersson <larsand_at_[hidden]> wrote:
> Hi,
> I'm just in the progress of moving our application from LAM/MPI to
> OpenMPI, mainly because OpenMPI makes it easier for a user to run
> multiple jobs(MPI universa) simultaneously. This is useful if a user
> wants to run smaller experiments without disturbing a large experiment
> running in the background). I've been evaluation the performance using
> a simple test, running on a hetrogenous cluster of 2 x dual core
> Opteron machines, a couple of dual core P4 Xeon machines and a 8 core
> Core2 machine. The main structure of the application is a master rank
> distributing jobs packages to the rest of the ranks and collecting the
> results. We don't use any fancy MPI features but rather see it as an
> efficient low-level tool for broadcasting and transferring data.
> When a single user runs a job (fully subscribed nodes, but not
> oversubscribed, i.e one process per cpu-core) on an otherwise unloaded
> cluster both LAM/MPI and OpenMPI average runtimes of about 1m33s
> (OpenMPI has a slightly lower average).
> When I start the same job simultaneously as two different users (thus
> oversubscribing the nodes 2x) under LAM/MPI, the two jobs finish as an
> average time of about 3m, thus scaling very well (we use the -ssi rpi
> sysv option to mpirun under LAM/MPI to avoid busy waiting).
> When running the same second experiment under OpenMPI, the average
> runtime jumps up to about 3m30s, with runs occasionally taking more
> than 4 minutes to complete. I do use the "--mca mpi_yield_when_idle 1"
> option to mpirun, but it doesn't seem to make any difference. I've
> also tried setting the environment variable
> OMPI_MCA_mpi_yield_when_idle=1, but still no change. ompi_info says:
> ompi_info --param all all | grep yield
> MCA mpi: parameter "mpi_yield_when_idle" (current value: "1")
> The cluster is used for various tasks, running MPI applications as
> well as non-MPI applications, so we would like to avoid spending too
> much cycles on busy-waiting. Any ideas on how to tweak OpenMPI to get
> better performance and more cooperative behavior in this case would be
> greatly appreciated.
> Cheers,
> Lars

No ideas? Such a large performance regression compared to LAM/MPI
seems quite serious to me. Or do you consider the case of
over-subscription not worth optimizing for? Or did I get something
totally wrong?