Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-08-14 17:45:29

The primary person you need to talk to is turning in her dissertation
within the next few days. So I think she's kinda busy at the
moment... :-)

Sorry for the delay -- I'll take a shot at answers below...

On Aug 14, 2007, at 4:39 PM, smairal_at_[hidden] wrote:

> Can anyone help on this?
> -Thanks,
> Sarang.
> Quoting smairal_at_[hidden]:
>> Hi,
>> I am doing a research on parallel techniques for shared-memory
>> systems(NUMA). I understand that OpenMPI is intelligent to utilize
>> shared-memory system and it uses processor-affinity.

Open MPI has coarse-grained processor-affinity control, see:

Expect to see more functionality / flexibility here in the future...

>> Is the OpenMPI design of MPI_AllReduce "same" for shared-memory
>> (NUMA) as well as distributed system? Can someone please tell me
>> MPI_AllReduce design, in brief, in terms of processes and their
>> interaction on shared-memory?

Open MPI is fundamentally based on plugins. We have plugins in for
various flavors of collective algorithms (see the code base: ompi/mca/
coll/), one of which is "sm" (shared memory). The shared memory
collectives are currently quite limited but are being expanded and
improved by Indiana University (e.g., IIRC, allreduce uses the shared
memory reduce followed by a shared memory bcast).

The "tuned" collective plugin has its own implementation(s) of
Allreduce -- Jelena or George will have to comment here. They do not
assume shared memory; they use well-known algorithms for allreduce.
The "tuned" component basically implements a wide variety of
algorithms for each MPI collective and attempts to choose which one
will be best to use at run-time. U. Tennessee has done a lot of work
in this area and I think they have several published papers on it.

The "basic" plugin is the dirt-simple correct-but-not-optimized
component that does simple linear and logarithmic algorithms for all
the MPI collectives. If we don't have a usable algorithm anywhere
else, we fall back to the basic plugin (e.g., allreduce is a reduce
followed by a bcast).

>> Else please suggest me a good reference for this.

Our basic philosophy / infrastructure for MPI collectives is based on
this paper:

Although work that happened literally last week is just about to hit
the development trunk (within a week or so -- still doing some
debugging) that brings Goodness to allowing a first-level of mixing-n-
matching between collective components that do not provide all the
MPI algorithms. I can explain more if you care.

Hope this helps...

Jeff Squyres
Cisco Systems