Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Best way to reduce 3D array
From: Gus Correa (gus_at_[hidden])
Date: 2010-03-30 18:39:24


Hi Derek

Great to read that you parallelized the code.
Sorry to hear about the OO problems,
although I enjoyed to read your characterization of it. :)
We also have plenty of that,
mostly with some Fortran90 codes that go OOverboard.

I think I suggested "YZ-books", i.e., decompose the domain across X,
which I guess would take advantage of the C array "row major order",
and obviate the need for creating MPI vector types.
However, I guess your choice really depends on how your data
is laid out in memory.

I am not sure if I understood the I/O (output) problem you described.
However, here is a suggestion.
I think I sent it in a previous email.
It assumes the global array fits rank 0/master process memory:

A) To input data (at the beginning) ,
rank 0 can read the all the data from a file to a big buffer/global
array, then all processes call MPI_Scatter[v],
which distributes the subarrays
to all ranks/slave processes;

B) To output data (at the end),
all processes call MPI_Gather[v],
which allows rank 0/master to collect the final results on a big
buffer/global array,
and then rank 0 does the output to a file (and in your case,
also converts to "Tecplot", I suppose).

If your domain decomposition took advantage of the array layout
in memory, each process can do a single call to MPI_Scatter
and/or to MPI_Gather[v] to do the job. All you need know is
the pointer to the first element of the (sub)array and its size
(and for the global array on rank0/master).

If the domain decomposition cuts across the array memory layout,
you may need to define an MPI vector type, with strides, etc,
and use it in the MPI functions above, which again can be called
only once.
With MPI type vector it is a bit more work and bookkeeping,
but not too hard.

This master/slave I/O pattern is quite common,
and admittedly old fashioned, since it doesn't take advantage of MPI-IO.
However, it is a reliable workhorse,
particularly if you have a plain NFS
mounted file system (as opposed to a parallel file system).

I hope this helps.

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Cole, Derek E wrote:
> Hi all,
>
>
>
> I posted before about doing a domain decomposition on a 3D array in C,
> and this is sort of a follow up to that. I was able to get the
> calculations working correctly by performing the calculations on XZ
> sub-domains for all Y dimensions of the space. I think someone referred
> to this as a “book.” In the space. Being that I now have an X starting
> and ending point, a Z starting and ending point, and a total number of X
> and Z points to visit in each direction during the computation, I am now
> at another hanging point. First, some background.
>
>
>
> I am working on modifying a code that was originally written to be run
> serially. That being said, there is a massive amount of object oriented
> crap that is making this a total nightmare to work on. All of the
> properties that are computed for each point in the 3D mesh are stored in
> structures, and those structures are stored in structures, blah blah, it
> looks very gross. In order to speed this code up, I was able to pull out
> the most computationally sensitive property (potential) and get it set
> up in this 3D array that is allocated nicely, etc. The problem is, this
> code eventually outputs after all the iterations to a Tecplot format.
> The code to do this is very, very contrived.
>
>
>
> My idea was to, for the sake of wanting to move on, stuff back all of
> these XZ subdomains that I have calculated into a single array on the
> first processor, so it can go about its way and do the file output on
> the WHOLE domain. I seem to be having problems though, extracting out
> these SubX * SubZ * Y sized portions of the original that can be sent to
> the first processor. Does anyone have any examples anywhere of code that
> does something like that? It appears that my 3D mesh is in X major
> format in memory, so I tried to create some loops to extract Y, SubZ
> sized columns of X to send back to the zero’th processor but I haven’t
> had much luck yet.
>
>
>
> Any tips are appreciated…thanks!
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users