When you use MPI message passing in
your application, the MPI library decides how to deliver the message. The
"magic" is simply that when sender process and receiver process
are on the same node (shared memory domain) the library uses shared memory
to deliver the message from process to process. When the sender process
and receiver process are on different nodes, some interconnect method is
The MPI API does not have any explicit
recognition of shared memory. If you are thinking of the MPI 1sided when
you mention "MPI-2 shared memory", we should be clear that MPI
1-sided communication is only vaguely similar to shared memory and only
provide access through MPI calls (MPI_Put, MPI_Get and MPI_Aaccumulate)
and does not magically created shared memory that you can load/store.
Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363
Andrei Fokau <firstname.lastname@example.org>
Open MPI Users <email@example.com>
10/06/2010 10:12 AM
Re: [OMPI users] Shared memory
Currently we run a code on a cluster with
distributed memory, and this code needs a lot of memory. Part of the data
stored in memory is the same for each process, but it is organized as one
array - we can split it if necessary. So far no magic occurred for
us. What do we need to do to make the magic working?
On Wed, Oct 6, 2010 at 12:43, Jeff Squyres (jsquyres)
Open MPI will use shared memory to communicate between
peers on the sane node - but that's hidden beneath the covers; it's not
exposed via the MPI API. You just MPI-send and magic occurs and the receiver
gets the message.
On Oct 4, 2010, at 11:13 AM, "Andrei Fokau"
Does OMPI have shared memory capabilities
(as it is mentioned in MPI-2)?
How can I use them?
On Sat, Sep 25, 2010 at 23:19, Andrei Fokau <firstname.lastname@example.org>
Here are some more details about our problem.
We use a dozen of 4-processor nodes with 8 GB memory on each node. The
code we run needs about 3 GB per processor, so we can load only 2 processors
out of 4. The vast majority of those 3 GB is the same for each processor
and is accessed continuously during calculation. In my original question
I wasn't very clear asking about a possibility to use shared memory with
Open MPI - in our case we do not need to have a remote access to the data,
and it would be sufficient to share memory within each node only.
Of course, the possibility to access the
data remotely (via mmap) is attractive because it would allow to store
much larger arrays (up to 10 GB) at one remote place, meaning higher accuracy
for our calculations. However, I believe that the access time would be
too long for the data read so frequently, and therefore the performance
would be lost.
I still hope that some of the subscribers
to this mailing list have an experience of using Global Arrays. This library
seems to be fine for our case, however I feel that there should be a simpler
solution. Open MPI conforms with MPI-2 standard, and the later has a description
of shared memory application. Do you see any other way for us to use shared
memory (within node) apart of using Global Arrays?
On Fri, Sep 24, 2010 at 19:03, Durga Choudhury <email@example.com>
I think the 'middle ground' approach can be simplified
even further if
the data file is in a shared device (e.g. NFS/Samba mount) that can be
mounted at the same location of the file system tree on all nodes. I
have never tried it, though and mmap()'ing a non-POSIX compliant file
system such as Samba might have issues I am unaware of.
However, I do not see why you should not be able to do this even if
the file is being written to as long as you call msync() before using
the mapped pages.
On Fri, Sep 24, 2010 at 12:31 PM, Eugene Loh <firstname.lastname@example.org>
> It seems to me there are two extremes.
> One is that you replicate the data for each process. This has
> disadvantage of consuming lots of memory "unnecessarily."
> Another extreme is that shared data is distributed over all processes.
> has the disadvantage of making at least some of the data less accessible,
> whether in programming complexity and/or run-time performance.
> I'm not familiar with Global Arrays. I was somewhat familiar
with HPF. I
> think the natural thing to do with those programming models is to
> data over all processes, which may relieve the excessive memory consumption
> you're trying to address but which may also just put you at a different
> "extreme" of this spectrum.
> The middle ground I think might make most sense would be to share
> within a node, but to replicate the data for each node. There
> multiple ways of doing this -- possibly even GA, I don't know.
> might be to use one MPI process per node, with OMP multithreading
> each process|node. Or (and I thought this was the solution you
> for), have some idea which processes are collocal. Have one
> node create and initialize some shared memory -- mmap, perhaps, or
> shared memory. Then, have its peers map the same shared memory
> address spaces.
> You asked what source code changes would be required. It depends.
> you're going to mmap shared memory in on each node, you need to know
> processes are collocal. If you're willing to constrain how processes
> mapped to nodes, this could be easy. (E.g., "every 4 processes
> collocal".) If you want to discover dynamically at run
time which are
> collocal, it would be harder. The mmap stuff could be in a stand-alone
> function of about a dozen lines. If the shared area is allocated
> piece, substituting the single malloc() call with a call to your mmap
> function should be simple. If you have many malloc()s you're
> replace, it's harder.
> Andrei Fokau wrote:
> The data are read from a file and processed before calculations begin,
> think that mapping will not work in our case.
> Global Arrays look promising indeed. As I said, we need to put just
> of data to the shared section. John, do you (or may be other users)
> experience of working with GA?
> When GA runs with MPI:
> MPI_Init(..) ! start MPI
> GA_Initialize() ! start global arrays
> MA_Init(..) ! start memory allocator
> .... do work
> GA_Terminate() ! tidy up global arrays
> MPI_Finalize() ! tidy up MPI
> On Fri, Sep 24, 2010 at 13:44, Reuti <email@example.com>
>> Am 24.09.2010 um 13:26 schrieb John Hearns:
>> > On 24 September 2010 08:46, Andrei Fokau <firstname.lastname@example.org>
>> > wrote:
>> >> We use a C-program which consumes a lot of memory per
process (up to
>> >> few
>> >> GB), 99% of the data being the same for each process.
So for us it
>> >> would be
>> >> quite reasonable to put that part of data in a shared
>> > http://www.emsl.pnl.gov/docs/global/
>> > Is this eny help? Apologies if I'm talking through my hat.
>> I was also thinking of this when I read "data in a shared
>> approaches like http://www.kerrighed.org/wiki/index.php/Main_Page).
>> this also one idea behind "High Performance Fortran"
- running in parallel
>> across nodes even without knowing that it's across nodes at all
>> programming and access all data like it's being local.