I think this sounds reasonable, if (and only if) MPI_Accumulate is
properly handled. The interface for calling the op functions was broken
in some fairly obvious way for accumulate when I was writing the one-sided
code. I think I had to call some supposedly internal bits of the
interface to make accumulate work. I can't remember what they are now,
but I do remember it being a problem.
Of course, unless it makes mpi_allreduce on one double-sized floating
point number using sum go faster, I'm not entirely sure a change is
On Mon, 5 Jan 2009, Jeff Squyres wrote:
> WHAT: Converting the back-end of MPI_Op's to use components instead of
> hard-coded C functions.
> WHY: To support specialized hardware (such as GPUs).
> WHERE: Changes most of the MPI_Op code, adds a new ompi/mca/op framework.
> WHEN: Work has started in an hg branch
> TIMEOUT: Next Tuesday's teleconference, Jan 13 2008.
> Note: I don't plan to finish the work by Jan 13; I just want to get a yea/nay
> from the community on the concept. Final review of the code before coming
> into the trunk can come later when I have more work to show / review.
> Background: Today, the back-end MPI_Op functionality of (MPI_Op,
> MPI_Datatype) tuples are implemented as function pointers to a series of
> hard-coded C functions in the ompi/op/ directory.
> *** NOTE: Since we already implement MPI_Op functionality via function
> pointer, this proposed extension is not expected to cause any performance
> difference in terms of OMPI's infrastructure.
> Proposal: Extend the current implementation by creating a new framework
> ("op") that allows components to provide back-end MPI_Op functions instead
> of/in addition to the hard-coded C functions (we've talked about this idea
> before, but never done it).
> The "op" framework will be similar to the MPI coll framework in that
> individual function pointers from multiple different modules can be
> mixed-n-matched. For example, if you want to write a new coll component that
> implements *only* a new MPI_BCAST algorithm, that coll component can be
> mixed-n-matched with other coll components at run time to get a full set of
> collective implementations on a communicator. A similar concept will be
> applied to the "op" framework. Case in point: some specialized hardware is
> only good at *some* operations on *some* datatypes; we'll need to fall back
> to the hard-coded C versions for all other tuples.
> It is likely that the the "op" framework base will have all the hard-coded C
> "basic" MPI_Op functions that will always be available for fallback if a
> component is not used at run-time for a specialized implementation.
> Specifically: the intent is that components will be for specialized