This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
On Tue, Mar 6, 2012 at 16:23, Tim Prince <n8tm_at_[hidden]> wrote:
> On 03/06/2012 03:59 PM, Kharche, Sanjay wrote:
>> I am working on a 3D ADI solver for the heat equation. I have implemented
>> it as serial. Would anybody be able to indicate the best and more
>> straightforward way to parallelise it. Apologies if this is going to the
>> wrong forum.
>> If it's to be implemented in parallelizable fashion (not SSOR style
> where each line uses updates from the previous line), it should be feasible
> to divide the outer loop into an appropriate number of blocks, or decompose
> the physical domain and perform ADI on individual blocks, then update and
True ADI has inherently high communication cost because a lot of data
movement is needed to make the _fundamentally sequential_ tridiagonal
solves local enough that latency doesn't kill you trying to keep those
solves distributed. This also applies (albeit to a lesser degree) in serial
due to way memory works.
If you only do non-overlapping subdomain solves, you must use a Krylov
method just to ensure convergence. You can add overlap, but the Krylov
method is still necessary for any practical convergence rate. The method
will also require an iteration count proportional to the number of
subdomains across the global domain times the square root of the number of
elements across a subdomain. The constants may not be small and this
asymptotic result is independent of what the subdomain solver is. You need
a coarse level to improve this scaling.
Sanjay, as Matt and I recommended when you asked the same question on the
PETSc list this morning, unless this is a homework assignment, you should
solve your problem with multigrid instead of ADI. We pointed you to simple
example code that scales well to many thousands of processes.