Good points...I'll see if anything can be done to speed up the master. If we can shrink the number of MPI processes without hurting overall throughput maybe I could save enough to fit another run on the freed cores. Thanks for the ideas!
I was also worried about contention on the nodes since I'm running multiple MPI processes on the same multi-core box. A typical run is 120 MPI processes on 5 nodes, each with 24 cores. I may play a little with the "--bynode" parameter to see if this has any (significant) effect
From: users-bounces_at_[hidden] on behalf of Richard Treumann
Sent: Fri 9/24/2010 9:16 AM
To: Open MPI Users
Subject: Re: [OMPI users] "self scheduled" work & mpi receive???
It sounds like you have more workers than you can keep fed. Workers are
finishing up and requesting their next assignment but sit idle because
there are so many other idle workers too.
Load balance does not really matter if the choke point is the master. The
work is being done as fast as the master can hand it out.
Consider using fewer workers and seeing if your load balance improves and
your total thruput stays the same. If you want to use all the workers you
have efficiently, you need to find a way to make the master deliver
assignments as fast as workers finish them.
Compute processes do not care about fairness. Having half the processes
busy 100% of the time and the other half idle vs. having all the
processes busy 50% of the time gives the same thruput and the hard workers
will not complain.
Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363
Mikael Lavoie <mikael.lavoie_at_[hidden]>
Open MPI Users <users_at_[hidden]>
09/23/2010 05:08 PM
Re: [OMPI users] "self scheduled" work & mpi receive???
I'm interested in you work, i have a app to convert for myself and i don't
know enough the MPI structure and syntaxe to make it...
So if you wanna share your app i'm interested in taking a look at it!!
Thanks and have a nice day!!
2010/9/23 Lewis, Ambrose J. <AMBROSE.J.LEWIS_at_[hidden]>
I've written an openmpi program that "self schedules" the work.
The master task is in a loop chunking up an input stream and handing off
jobs to worker tasks. At first the master gives the next job to the next
highest rank. After all ranks have their first job, the master waits via
an MPI receive call for the next free worker. The master parses out the
rank from the MPI receive and sends the next job to this node. The jobs
aren't all identical, so they run for slightly different durations based
on the input data.
When I plot a histogram of the number of jobs each worker performed, the
lower mpi ranks are doing much more work than the higher ranks. For
example, in a 120 process run, rank 1 did 32 jobs while rank 119 only did
2. My guess is that openmpi returns the lowest rank from the MPI Recv
when I've got MPI_ANY_SOURCE set and multiple sends have happened since
the last call.
Is there a different Recv call to make that will spread out the data
users mailing list
users mailing list