Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Andreas Schäfer (gentryx_at_[hidden])
Date: 2006-12-19 20:35:30


On 12:33 Tue 19 Dec , Harakiri wrote:
> i need some input on the parallelisation of quicksort.
> If anyone knows a more appropriate forum/list than
> this, please tell me =)

uhm, do you have access to a library? Get yourself a textbook on
algorithms. Knuth should be a good basis to start with...

> I've written 2 different approaches to the quicksort
> parallelisation.
> 1. The "master" server splits the array of numbers
> into N seperate parts, where N is the number of
> available clients or slave servers. Each slave then
> does a quicksort on its own subarray - finally all
> slaves sent their subarray to the master. The master
> then does a quicksort on the presorted list.
> *This was done for the assumption that presorted list
> sort faster with quicksort than very unordered/random
> numbers lists.

Just some random ideas:

 * Are you bound to use Quicksort in the master? Since the arrays are
   presorted, Mergesort would be more advisable.

 * I wouldn't combine the arrays at the master but do this step in
   parallel, too. Given 4 machines 0..3, machine 0 would merge its own
   subarray with no. 1's while machine 2 would merge with
   no. 3. finally no. 0 and no. 3 would combine their arrays.

 * All of this is quite obsolete, as there is an optimal parallel
   Quicksort at hands (all praise Wikipedia):

> 2. The "master" server finds out the MIN and MAX value
> >from the to sorting array (iterates through all
> numbers). Then it will assign boundaries to each slave
> starting from MIN for the first slave, to MAX for the
> last slave (i.e. for 2 slaves it would be from MIN ->
> MAX/2 and MAX/2+1 -> MAX).

 That would imply assumptions on the distribution of the
 elements. This may be clever from a practical point of view, but is
 flawed because of theoretical considerations. Assume you had 1000
 elements ranging from 1.0 to 1.1 and just one element with a value of
 100.0. No good.

> The master then sends the whole array to each slave,
> each slave filters through the whole array all numbers
> within their own boundary (slave 1 for example would
> find all numbers between MIN -> MAX/2),

Communication is expensive. Sending n elements is more expensive than
filtering them on the local machine and sending only the remainder. At
least this is what I'd expect, although I haven't benchmarked this.

> Anyone has any different ideas ? I've read a few
> papers about it - the basic approach would be to
> parallel the divide and conquer part - which would
> result in ALOT of network messages...

As already said, please read Powers' paper from above. I could imagine
that even though this results in _many_ messages, the algorithms
optimal runtime complexity will compensate for it.

But benchmark your own ;-)


Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
+49 3641 9 46376 - voice

  • application/pgp-signature attachment: stored