Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Brian Barrett (bbarrett_at_[hidden])
Date: 2007-07-25 10:39:33


I have an even bigger objection than Rich. It's near impossible to
measure the latency impact of something like this, but it does have
an additive effect. It doesn't make sense to have all that code in
the critical path for systems where it's not needed. We should leave
the compile time decision available, unless there's a very good
reason (which I did not see in this e-mail) to remove it.

Brian

On Jul 25, 2007, at 8:00 AM, Richard Graham wrote:

> This is good work, so I am happy to see it come over. My initial
> understanding was that
> there would be compile time protection for this. In the absence
> of this, I think we need
> to see performance data on a variety of communication substrates.
> It seems like a latency
> measurement is, perhaps, the most sensitive measurement, and
> should be sufficient to
> see the impact on the critical path.
>
> Rich
>
>
> On 7/25/07 9:04 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>
>> WHAT: Merge the sparse groups work to the trunk; get the
>> community's
>> opinion on one remaining issue.
>> WHY: For large MPI jobs, it can be memory-prohibitive to fully
>> represent dense groups; you can save a lot of space by
>> having
>> "sparse" representations of groups that are (for example)
>> derived from MPI_COMM_WORLD.
>> WHERE: Main changes are (might have missed a few in this analysis,
>> but this is 99% of it):
>> - Big changes in ompi/group
>> - Moderate changes in ompi/comm
>> - Trivial changes in ompi/mpi/c, ompi/mca/pml/[dr|ob1],
>> ompi/mca/comm/sm
>> WHEN: The code is ready now in /tmp/sparse-groups (it is passing
>> all Intel and IBM tests; see below).
>> TIMEOUT: We'll merge all the work to the trunk and enable the
>> possibility of using sparse groups (dense will still be the
>> default, of course) if no one objects by COB Tuesday, 31
>> Aug
>> 2007.
>>
>> =====================================================================
>> ===
>> ===
>>
>> The sparse groups work from U. Houston is ready to be brought into
>> the
>> trunk. It is built on the premise that for very large MPI jobs, you
>> don't want to fully represent MPI groups in memory if you don't have
>> to. Specifically, you can save memory for communicators/groups that
>> are derived from MPI_COMM_WORLD by representing them in a sparse
>> storage format.
>>
>> The sparse groups work introduces 3 new ompi_group_t storage formats:
>>
>> * dense (i.e., what it is today -- an array of ompi_proc_t pointers)
>> * sparse, where the current group's contents are based on the group
>> from which the child was derived:
>> 1. range: a series of (offset,length) tuples
>> 2. stride: a single (first,stride,last) tuple
>> 3. bitmap: a bitmap
>>
>> Currently, all the sparse groups code must be enabled by configuring
>> with --enable-sparse-groups. If sparse groups are enabled, each MPI
>> group that is created will automatically use the storage format that
>> takes the least amount of space.
>>
>> The Big Issue with the sparse groups is that getting a pointer to an
>> ompi_proc_t may no longer be an O(1) operation -- you can't just
>> access it via comm->group->procs[i]. Instead, you have to call a
>> macro. If sparse groups are enabled, this will call a function to do
>> the resolution and return the proc pointer. If sparse groups are not
>> enabled, the macro currently resolves to group->procs[i].
>>
>> When sparse groups are enabled, looking up a proc pointer is an
>> iterative process; you have to traverse up through one or more parent
>> groups until you reach a "dense" group to get the pointer. So the
>> time to lookup the proc pointer (essentially) depends on the group
>> and
>> how many times it has been derived from a parent group (there are
>> corner cases where the lookup time is shorter). Lookup times in
>> MPI_COMM_WORLD are O(1) because it is dense, but it now requires an
>> inline function call rather than directly accessing the data
>> structure (see below).
>>
>> Note that the code in /tmp/sparse-groups is currently out-of-date
>> with
>> respect to the orte and opal trees due to SVN merge mistakes and
>> problems. Testing has occurred by copying full orte/opal branches
>> from a trunk checkout into the sparse group tree, so we're confident
>> that it's compatible with the trunk. Full integration will occur
>> before commiting to the trunk, of course.
>>
>> The proposal we have for the community is as follows:
>>
>> 1. Remove the --enable-sparse-groups configure option
>> 2. Default to use only dense groups (i.e., same as today)
>> 3. If the new MCA parameter "mpi_use_sparse_groups" is enabled,
>> enable
>> the use of sparse groups
>> 4. Eliminate the current macro used for group proc lookups and
>> instead
>> use an inline function of the form:
>>
>> static inline ompi_proc_t lookup_group(ompi_group_t *group, int
>> index) {
>> if (group_is_dense(group)) {
>> return group->procs[index];
>> } else {
>> return sparse_group_lookup(group, index);
>> }
>> }
>>
>> *** NOTE: This design adds a single "if" in some
>> performance-critical paths. If the group is sparse, it will
>> add a function call and the overhead to do the lookup.
>> If the group is dense (which will be the default), the proc
>> will be returned directly from the inline function.
>>
>> The rationale is that adding a single "if" (perhaps with
>> OPAL_[UN]LIKELY?) in a few code paths will not be a big deal.
>>
>> 5. Bring all these changes into the OMPI trunk and therefore into
>> v1.3.
>>
>> Comments?
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel