Hi Jed, All

Many thanks to your informative replies on this and the PetSc list. I will make a decision based on your advice.


From: users-bounces@open-mpi.org [users-bounces@open-mpi.org] on behalf of Jed Brown [jed@59a2.org]
Sent: 06 March 2012 22:50
To: tprince@computer.org; Open MPI Users
Subject: Re: [OMPI users] parallelising ADI

On Tue, Mar 6, 2012 at 16:23, Tim Prince <n8tm@aol.com> wrote:
 On 03/06/2012 03:59 PM, Kharche, Sanjay wrote:

I am working on a 3D ADI solver for the heat equation. I have implemented it as serial. Would anybody be able to indicate the best and more straightforward way to parallelise it. Apologies if this is going to the wrong forum.

If it's to be implemented in parallelizable fashion (not SSOR style where each line uses updates from the previous line), it should be feasible to divide the outer loop into an appropriate number of blocks, or decompose the physical domain and perform ADI on individual blocks, then update and repeat.

True ADI has inherently high communication cost because a lot of data movement is needed to make the _fundamentally sequential_ tridiagonal solves local enough that latency doesn't kill you trying to keep those solves distributed. This also applies (albeit to a lesser degree) in serial due to way memory works.

If you only do non-overlapping subdomain solves, you must use a Krylov method just to ensure convergence. You can add overlap, but the Krylov method is still necessary for any practical convergence rate. The method will also require an iteration count proportional to the number of subdomains across the global domain times the square root of the number of elements across a subdomain. The constants may not be small and this asymptotic result is independent of what the subdomain solver is. You need a coarse level to improve this scaling.

Sanjay, as Matt and I recommended when you asked the same question on the PETSc list this morning, unless this is a homework assignment, you should solve your problem with multigrid instead of ADI. We pointed you to simple example code that scales well to many thousands of processes.