Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Best way to reduce 3D array
From: Gus Correa (gus_at_[hidden])
Date: 2010-04-07 12:42:11

Hi Derek

Cole, Derek E wrote:
> Thanks for the ideas.
> I did finally end up getting this working by sending back to
> the master process. It's quite ugly, and added a good bit of
> MPI to the code, but it works for now,
> and I will revisit this later.
Is the MPI code uglier than the OO-stuff you mentioned before? :)

That you parallelized the code is an accomplishment anyway.
Maybe "It works" is the first level of astonishment and
reward one can get from programming, particularly in MPI! :)
Unfortunately, "It is efficient", "It is portable",
"It is easy to change and maintain", etc, seem to come later,
at least in real world conditions.
(OK, throw me eggs and tomatoes ...)

However, your quick description suggests that you cared about the
other items too, using MPI types to make the code more elegant and
efficient, for instance.

In principle I agree with another posting
(I can't find it now) that advocated careful code design,
from scratch, with a parallel algorithm in mind,
and, whenever possible, taking advantage of quality libraries built
on top of MPI (e.g. PETSc).

However, most of the time we are patching and refurbishing
existing code, particularly when it comes parallelization
(with MPI, OpenMP or other).
At least this is the reality I see in our area here (Earth Sciences).

I would guess in other areas of engineering it is the same.
Most of the time architects are dealing with building maintenance,
then sometimes with building reform, but only rarely they work on the
design of a new building, or not?

> I am not sure what the file system is,
> I think it is XFS, but I don't know much about why this
> has an effect on the output - just the way files can be
> opened at once or something?

I meant parallel (PVFS, etc) versus serial (ext3, xfs, etc)
file systems.
I guess you have XFS one one machine,
mounted over NFS across the cluster.
If you send too many read and write requests you may
overwhelm NFS, at least this is my experience.
By contrast, MPI scales much better with the number of
processes that exchange messages.
Hence, better funnel the data flow through MPI instead,
and let NFS talk to a single process (or to a single process at a time).
For this type of situation the old scheme:
"master reads and data is scattered;
data is gathered and master writes",
works fine, regardless of whether you
may think your code looks ugly or not.
Ricardo Reis suggested another solution, using a loop and MPI_Barrier
to serialize the writes from all processes,
and avoid file contention on NFS.
Another way would be to use MPI-IO.

> I did have to end up using an MPI Data type,
> because this 3D domain was strided nicely in X,
> but not the other dimensions.
> The domain is larger in Z,
> so I wanted to order my loops such that Z is the innermost.
> This helped cut down some of the MPI overhead.
> It would have been nice to avoid this,
> but I could not think of the way to do it,
> and still have all of the computes working on the largest =
> section of data possible.
> Derek

I agree. The underlying algorithm to some extent dictates how MPI
should be used, and how the data is laid out and distributed.

In the best of the worlds you could devise and develop
an algorithm that is both computationally and MPI (i.e.
communication-wise) efficient, and simple, and clean, etc.
More often then not one doesn't have the time or support to
do this, right? The end user seldom cares about it either.
At least this has been my experience here.

Gus Correa
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA

> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Ricardo Reis
> Sent: Monday, April 05, 2010 3:20 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Best way to reduce 3D array
> On Mon, 5 Apr 2010, Rob Latham wrote:
>> On Tue, Mar 30, 2010 at 11:51:39PM +0100, Ricardo Reis wrote:
>>> If using the master/slace IO model, would it be better to cicle
>>> through all the process and each one would write it's part of the
>>> array into the file. This file would be open in "stream" mode...
>>> like
>>> do p=0,nprocs-1
>>> if(my_rank.eq.i)then
>>> openfile (append mode)
>>> write_to_file
>>> closefile
>>> endif
>>> call MPI_Barrier(world,ierr)
>>> enddo
>> Note that there's no guarantee of the order here, though. Nothing
>> prevents rank 30 from hitting that loop before rank 2 does. To ensure
> don't they all have to hit the same Barrier? I think that will ensure order in this business... or am I being blind to something?
> I will agree, though, this is not the best solution to do it. I use this kind of arrangment when I'm desperate to do some prinf kind of debugging and want it ordered by process. Never had a problem with it.
> I mean, I assume there is some sort of sync before the do cycle starts.
> cheers!
> Ricardo Reis
> 'Non Serviam'
> PhD candidate @ Lasef
> Computational Fluid Dynamics, High Performance Computing, Turbulence
> Cultural Instigator @ Rádio Zero
> Keep them Flying! Ajude a/help Aero Fénix!
> < sent with alpine 2.00 >
> _______________________________________________
> users mailing list
> users_at_[hidden]