Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] RFC: revamp of ORTE global structures
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-05-17 11:31:00

WHAT: Revise the global ORTE data structures:
                  * orte_app_context_t
                  * orte_node_t
                  * orte_job_t
                  * orte_proc_t

WHY: The current definitions are rigid and hard to extend. In the past, we have extended
               them by hard-coding new fields into the structures. This has led to issues for
               off-trunk researchers and developers, and caused the structures to balloon in size.

WHEN: This is pretty disruptive and touches a lot of ORTE files, so let's give it a few weeks
                 and set timeout for June 3rd after the telecon


PLEASE test your favorite mpirun options to ensure everything is working correctly. There are quite a few combinations, and I can't possibly guarantee I have hit them all.

More detail:

As noted in the summary, every time we want to add another capability to the system, we frequently wind up adding another dedicated field to the ORTE data structures. For example, we have a number of booleans in the structures, each of which may only be used in a single, uncommon use-case. Those wanting to investigate new capabilities, or developers wishing to add something to the system, not only need to add more fields to the structures, but also (a) ensure that the datatype support routines know about them, (b) ensure that the odls packing/unpacking functions know how to handle it, if the capability involves launching processes, and (c) ensure that the nidmap code knows about any new data fields.

All together, it is pretty intimidating and fragile - and adds memory footprint for every feature.

As many of you know, we are about to add a number of new features to the system (e.g., power/freq control, direct cgroup support). After starting to work on these, it became apparent that we would be adding yet another set of rarely used fields to the various structures, further increasing the memory footprint for no good reason. Hence, I undertook a revision of not only the objects, but also how we handle their transmission during launch.

The resulting code can be broken down into two key concepts:

* combining frequently used booleans into a single "flag" field in each structure - the size of the flag varies between the structures according to the number of required booleans. Macros are provided to set/unset/test flags so we can easily revise the system as required (e.g., if we need someday to go to opal_bitmap_t's instead of simple int-like fields).

* adding a list of "attributes" to each structure where infrequently used and/or non-boolean options can be stored. A new "orte_attribute_t" structure is defined that provides a key/value storage mechanism for these lists. In order to conserve memory, the key is an integer instead of a string. Functions for setting and getting attributes are provided. When an attribute is "set", you also specify whether it is to be shared globally (i.e., to be included when packing the associated structure's attribute list), or to be kept local.

Definition of the new flags and attributes are provided in two new files:

* orte/util/attr.h - contains key and structure definitions for attributes, and flag names plus macros

* orte/util/attr.c - contains the attribute support functions

These revisions have allowed me to not only reduce our memory footprint, but also reduce the size of the launch message by removing a lot of duplicated and unnecessary info. The nidmap and odls codes have been revamped accordingly.

Comments and/or suggestions are welcomed.