First off, I think Jeff makes some very good points.
If you still think your applications will benefit from yielding
instead of hogging the cpu,
you should probably try to use the parameter "mpi_show_mca_params".
This will give you a list of the mca parameters at runtime. This way
you can see
what the yield_when_idle-parameter really looks like at runtime. Ompi
seems to be
overriding the user some times. If yield_when_idle is disabled, I
has to be done to the open mpi code to make it yield.
Guess this didn't help at all, but at least you can check if you are
On Apr 13, 2008, at 1:51 PM, Jeff Squyres wrote:
> Sorry for the delays in replying.
> The central problem is that Open MPI is much more aggressive about its
> message passing progress than LAM is -- it simply wasn't designed to
> share well as a mechanism to get as high performance as possible.
> mpi_yield_when_idle is most helpful only for certain transports that
> actively use our event engine, such as the TCP device. Since you're
> using the LAM sysv RPI, I assume you're using the TCP and shared
> memory devices in OMPI, right? If you're using infiniband, for
> example, the event engine is not called much because IB has its own
> progression engine that is unrelated to OMPI's (and therefore we don't
> invoke OMPI's much).
> mpi_yield_when_idle is also only helpful if you're going into the MPI
> layer often and making message passing progress (i.e., OMPI's event
> engine is actively being invoked). Is this true for your application?
> If mpi_yield_when_idle really doesn't help much, you may consider
> sprinkling calls to sched_yield() in your codes to force the process
> to yield the processor.
> On Apr 4, 2008, at 2:30 AM, Lars Andersson wrote:
>> I'm just in the progress of moving our application from LAM/MPI to
>> OpenMPI, mainly because OpenMPI makes it easier for a user to run
>> multiple jobs(MPI universa) simultaneously. This is useful if a user
>> wants to run smaller experiments without disturbing a large
>> running in the background). I've been evaluation the performance
>> a simple test, running on a hetrogenous cluster of 2 x dual core
>> Opteron machines, a couple of dual core P4 Xeon machines and a 8 core
>> Core2 machine. The main structure of the application is a master rank
>> distributing jobs packages to the rest of the ranks and collecting
>> results. We don't use any fancy MPI features but rather see it as an
>> efficient low-level tool for broadcasting and transferring data.
>> When a single user runs a job (fully subscribed nodes, but not
>> oversubscribed, i.e one process per cpu-core) on an otherwise
>> cluster both LAM/MPI and OpenMPI average runtimes of about 1m33s
>> (OpenMPI has a slightly lower average).
>> When I start the same job simultaneously as two different users (thus
>> oversubscribing the nodes 2x) under LAM/MPI, the two jobs finish as
>> average time of about 3m, thus scaling very well (we use the -ssi rpi
>> sysv option to mpirun under LAM/MPI to avoid busy waiting).
>> When running the same second experiment under OpenMPI, the average
>> runtime jumps up to about 3m30s, with runs occasionally taking more
>> than 4 minutes to complete. I do use the "--mca mpi_yield_when_idle
>> option to mpirun, but it doesn't seem to make any difference. I've
>> also tried setting the environment variable
>> OMPI_MCA_mpi_yield_when_idle=1, but still no change. ompi_info says:
>> ompi_info --param all all | grep yield
>> MCA mpi: parameter "mpi_yield_when_idle" (current
>> value: "1")
>> The cluster is used for various tasks, running MPI applications as
>> well as non-MPI applications, so we would like to avoid spending too
>> much cycles on busy-waiting. Any ideas on how to tweak OpenMPI to get
>> better performance and more cooperative behavior in this case would
>> greatly appreciated.
>> users mailing list
> Jeff Squyres
> Cisco Systems
> users mailing list