Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Mohamad Chaarawi (mschaara_at_[hidden])
Date: 2007-07-27 14:03:44

I Updated the RFC..

> From: Jeff Squyres <jsquyres_at_[hidden]>
> Date: July 25, 2007 9:04:44 AM EDT
> To: Open Developers <devel_at_[hidden]>
> Subject: [OMPI devel] [RFC] Sparse group implementation
> Reply-To: Open MPI Developers <devel_at_[hidden]>
> WHAT: Merge the sparse groups work to the trunk; get the
> community's
> opinion on one remaining issue.
> WHY: For large MPI jobs, it can be memory-prohibitive to fully
> represent dense groups; you can save a lot of space by
> having
> "sparse" representations of groups that are (for example)
> derived from MPI_COMM_WORLD.
> WHERE: Main changes are (might have missed a few in this analysis,
> but this is 99% of it):
> - Big changes in ompi/group
> - Moderate changes in ompi/comm
> - Trivial changes in ompi/mpi/c, ompi/mca/pml/[dr|ob1],
> ompi/mca/comm/sm
> WHEN: The code is ready now in /tmp/sparse (it is passing
> all Intel and IBM tests; see below).
> TIMEOUT: We'll merge all the work to the trunk and enable the
> possibility of using sparse groups (dense will still be the
> default, of course) if no one objects by COB Tuesday, 31 Aug
> 2007.
> ======================================================================
> ==
> ===
> The sparse groups work from U. Houston is ready to be brought into the

> trunk. It is built on the premise that for very large MPI jobs, you
> don't want to fully represent MPI groups in memory if you don't have
> to. Specifically, you can save memory for communicators/groups that
> are derived from MPI_COMM_WORLD by representing them in a sparse
> storage format.
> The sparse groups work introduces 3 new ompi_group_t storage formats:
> * dense (i.e., what it is today -- an array of ompi_proc_t pointers)
> * sparse, where the current group's contents are based on the group
> from which the child was derived:
> 1. range: a series of (offset,length) tuples
> 2. stride: a single (first,stride,last) tuple
> 3. bitmap: a bitmap
> Currently, all the sparse groups code must be enabled by configuring
> with --enable-sparse-groups. If sparse groups are enabled, each MPI
> group that is created will automatically use the storage format that
> takes the least amount of space.
> The Big Issue with the sparse groups is that getting a pointer to an
> ompi_proc_t may no longer be an O(1) operation -- you can't just
> access it via comm->group->procs[i]. Instead, you have to call a
> macro. If sparse groups are enabled, this will call a function to do
> the resolution and return the proc pointer. If sparse groups are not
> enabled, the macro currently resolves to group->procs[i].

Actually there is no macro anymore. Brian Suggested that we make it and
inline function (ompi_group_peer_lookup) that checks if sparse groups are
enabled (#if OMPI_GROUP_SPARSE) and acts accrodingly..

> When sparse groups are enabled, looking up a proc pointer is an
> iterative process; you have to traverse up through one or more parent
> groups until you reach a "dense" group to get the pointer. So the
> time to lookup the proc pointer (essentially) depends on the group and

> how many times it has been derived from a parent group (there are
> corner cases where the lookup time is shorter). Lookup times in
> MPI_COMM_WORLD are O(1) because it is dense, but it now requires an
> inline function call rather than directly accessing the data structure

> (see below).
> Note that the code in /tmp/sparse-groups is currently out-of-date with

> respect to the orte and opal trees due to SVN merge mistakes and
> problems. Testing has occurred by copying full orte/opal branches
> from a trunk checkout into the sparse group tree, so we're confident
> that it's compatible with the trunk. Full integration will occur
> before commiting to the trunk, of course.

A new branch has been created in /tmp/sparse that works perfect..

> The proposal we have for the community is as follows:
> 1. Remove the --enable-sparse-groups configure option 2. Default to
> use only dense groups (i.e., same as today) 3. If the new MCA
> parameter "mpi_use_sparse_groups" is enabled, enable
> the use of sparse groups

The configure option will be kept. we will also have a runtime option
(mpi_use_sparse_groups) that is set by default when the sparse groups are
enabled on configure.

> 4. Eliminate the current macro used for group proc lookups and instead
> use an inline function of the form:
> static inline ompi_proc_t lookup_group(ompi_group_t *group, int
> index) {
> if (group_is_dense(group)) {
> return group->procs[index];
> } else {
> return sparse_group_lookup(group, index);
> }
> }

Done, however the inline functions uses #if instead of if()..

> *** NOTE: This design adds a single "if" in some
> performance-critical paths. If the group is sparse, it will
> add a function call and the overhead to do the lookup.
> If the group is dense (which will be the default), the proc
> will be returned directly from the inline function.
> The rationale is that adding a single "if" (perhaps with
> OPAL_[UN]LIKELY?) in a few code paths will not be a big deal.

Another proposition that i mentioned before is to keep the sparse
parameters in the group structure (not compile them out) when the sparse
groups are disabled, which will remove almost all #ifs from the code,
which will be much easier for the eyes (the main reason).. Brian had some
Again the extra parameters will be 5 integers and 3 pointers.

> 5. Bring all these changes into the OMPI trunk and therefore into
> v1.3.
> Comments?
> --
> Jeff Squyres
> Cisco Systems
> _______________________________________________
> devel mailing list
> devel_at_[hidden]

Jeff Squyres
Cisco Systems
Mohamad Chaarawi
Instructional Assistant
Department of Computer Science	  University of Houston
4800 Calhoun, PGH Room 526        Houston, TX 77204, USA