Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Andreas Schäfer (gentryx_at_[hidden])
Date: 2006-12-19 20:35:30


Hiho,

On 12:33 Tue 19 Dec , Harakiri wrote:
> i need some input on the parallelisation of quicksort.
> If anyone knows a more appropriate forum/list than
> this, please tell me =)

uhm, do you have access to a library? Get yourself a textbook on
algorithms. Knuth should be a good basis to start with...

>
> I've written 2 different approaches to the quicksort
> parallelisation.
>
> 1. The "master" server splits the array of numbers
> into N seperate parts, where N is the number of
> available clients or slave servers. Each slave then
> does a quicksort on its own subarray - finally all
> slaves sent their subarray to the master. The master
> then does a quicksort on the presorted list.
> *This was done for the assumption that presorted list
> sort faster with quicksort than very unordered/random
> numbers lists.

Just some random ideas:

 * Are you bound to use Quicksort in the master? Since the arrays are
   presorted, Mergesort would be more advisable.

 * I wouldn't combine the arrays at the master but do this step in
   parallel, too. Given 4 machines 0..3, machine 0 would merge its own
   subarray with no. 1's while machine 2 would merge with
   no. 3. finally no. 0 and no. 3 would combine their arrays.

 * All of this is quite obsolete, as there is an optimal parallel
   Quicksort at hands (all praise Wikipedia):
   http://citeseer.ist.psu.edu/cache/papers/cs/16170/ftp:zSzzSzai.ist.flinders.edu.auzSzpubzSzaizSzpaperszSz199109-PaCT.pdf/powers91parallelized.pdf

> 2. The "master" server finds out the MIN and MAX value
> >from the to sorting array (iterates through all
> numbers). Then it will assign boundaries to each slave
> starting from MIN for the first slave, to MAX for the
> last slave (i.e. for 2 slaves it would be from MIN ->
> MAX/2 and MAX/2+1 -> MAX).

 That would imply assumptions on the distribution of the
 elements. This may be clever from a practical point of view, but is
 flawed because of theoretical considerations. Assume you had 1000
 elements ranging from 1.0 to 1.1 and just one element with a value of
 100.0. No good.

> The master then sends the whole array to each slave,
> each slave filters through the whole array all numbers
> within their own boundary (slave 1 for example would
> find all numbers between MIN -> MAX/2),

Communication is expensive. Sending n elements is more expensive than
filtering them on the local machine and sending only the remainder. At
least this is what I'd expect, although I haven't benchmarked this.

> Anyone has any different ideas ? I've read a few
> papers about it - the basic approach would be to
> parallel the divide and conquer part - which would
> result in ALOT of network messages...

As already said, please read Powers' paper from above. I could imagine
that even though this results in _many_ messages, the algorithms
optimal runtime complexity will compensate for it.

But benchmark your own ;-)

hth
-Andreas

-- 
============================================
Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
+49 3641 9 46376 - voice
============================================


  • application/pgp-signature attachment: stored