Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph H. Castain (rhc_at_[hidden])
Date: 2006-02-06 23:09:05

Hello all

After several months of development, I have merged the new data
support subsystem for ORTE into the trunk. I must provide one caveat
of warning: I have made every effort to test the revised system, but
cannot guarantee its operation in every condition and under every
system. For one, I don't have access to every type of system to which
ORTE/OMPI has been ported...and, to be honest, the trunk moves so
quickly that I would never get this merged if I keep chasing the
latest trunk version. Hence, you may see some degree of instability -
hopefully, this will be minimal or non-existent, but it could happen.

Those of you primarily interested in the MPI layer need read no
further unless you intend to use any of the ORTE data types. For
everyone else, please read on.

The primary changes in this revision were:

1. redefinition of several key data types, including the
orte_data_value_t, orte_gpr_value_t, and orte_gpr_keyval_t
structures. This was done in order to eliminate ALL knowledge of data
types from the registry - the registry now has no knowledge of what
is being stored. This allowed the second change...

2. completely localize all data type functionality. In the prior
version, a developer who changed a data type definition (e.g., adding
an element to a defined structure) was required to make corresponding
changes to functions that copied, deleted, compared, and printed the
data type in a number of places. In particular, this was required in
at least three locations within the registry subsystem! This level of
complexity caused a number of errors to occur, driven by someone
changing a structure and not catching the necessary changes
everywhere else. This resulted in unstable behavior that was very
hard to debug and fix.

The new data support subsystem resolves this problem by requiring the
definer of a data type to provide several key functions:

a. compare - how to compare two instances of the data type, providing
a value of equal, value 1 greater, or value 2 greater. These three
outputs are now defined values to ensure compatibility throughout the
code base - please USE THEM.

b. copy - how to copy one instance into a new data location,
allocating memory dynamically to provide the necessary storage

c. print - method to pretty-print the contents of the data type,
essential for debugging and/or use by the registry "dump" functions

d. size - method to compute the size of the specified data type
instance, including the size of any non-static fields (e.g., a string variable)

e. release - method for releasing a dynamically-allocated instance of
the data type. In most cases, this function either does a free or an
OBJ_RELEASE, but it could be used (for example) to provide a
debugging version of a release function

f. pack/unpack - how to pack/unpack an instance into an ORTE buffer
for transmission

In addition, the data type definition requires that two values be provided:

a. boolean flag indicating whether the data type is structured or
not. This was provided in addition to the release function to allow a
developer to (for example) define a debugging release independent of
the "flavor" (i.e., structured or not) of the data type

b. a name for the data type. This is required to be unique.

All of these functions have been provided for the "standard" data
types (ints, bool, etc.), so you don't have to worry about those. For
an example of these functions, you can look either at the orte/dss
functions (where the standard data types are supported) or at the
orte/mca/gpr/base/data_type_support directory where more complex
types are defined. The orte/dss/dss_open_close.c and
orte/mca/gpr/base/gpr_base_open.c functions include the data type
registration calls. I have also provided the functions for all of the
current orte defined data types.

Two other functional entries (set and get) to the data support
subsystem were created that are intended to mimic true
object-oriented programming for the orte_data_value_t object. There
are times in the code where it is more convenient to work with
statically-defined variables. Using the "copy" function, however, to
move data from one object to another causes memory to be dynamically
allocated. The set/get functions provide a "safe" method for doing
this statically.

In addition to changing the data type definitions, two "helper"
functions were created to support the gpr_value and gpr_keyval
structures. In working through the code, I found a number of
instances where people had forgotten to completely define these
structures, leaving some fields unintentionally "blank". This
appeared to cause problems at times, and definitely caused headaches
when making this transition. In addition, there was a lot of
duplicative and painful code due to all the error checking required
while building one of these structures.

To simplify things, I created two new gpr API functions: create_value
and create_keyval. Each of these takes as arguments the values to be
placed in their respective fields, and will return to you a fully
built structure with all the desired error checking for memory
availability etc. Using these functions will also protect you against
any future changes to the system. The only negative is that these
functions dynamically allocate the required memory.

I hope that helps to explain the changes. As you can see from the
commit, this hit a large number of functions. I have provided unit
tests for all the data types within the revised data support system
that help illustrate how that system is used. In particular, you can
look at test/dss and at test/mca/gpr (the gpr_dt_xxx functions) for examples.

Please feel free to holler with questions - and do please let me know
if you find any problems with the revisions.