Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: move BTLs out of ompi into separate layer
From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2009-03-09 16:01:29

I, not suprisingly, have serious concerns about this RFC. It assumes that
the ompi_proc issues and bootstrapping issues (the entire point of the
move, as I understand it) can both be solved, but offer no proof to
support that claim. Without those two issues solved, we would be left
with an onet layer that is dependent on ORTE and OMPI, and which OMPI
depends upon. This is not a good place to be. These issues should be
resolved before an onet layer is created in the trunk.

This is not an unusual requirement. The fault tolerance work took a very
long time because of similar requirements. Not only was a full
implementation required to prove performance would not be negatively
impacted (when FT wasn't active), but we had discussions about its impact
on code maintainability. We had a full implementation of all the pieces
that impacted the code *before* any of it was allowed into the trunk.

We should live by the rules the community has setup. They have served us
well in the past. Further, these are not new objections on my part.
Since the initial RFCs related to this move started, I have continually
brought up the exact same questions and never gotten a satisfactory
answer. This RFC even acknowledges the issues, but without presenting any
solution and still asks to do the most disruptive work. I simply can't
see how that fits with Open MPI's long-standing development proceedures.

If all the issues I've asked about previously (which are essentially the
ones you've identified in the RFC) can be solved, the impact to code base
maintainability is reasonable, and the impact to performance is
negligable, I'll gladly remove my objection to this RFC.

Further, before any work on this branch is brought into the trunk, the
admin-level discussion regarding this issue should be resolved. At this
time, that discussion is blocking on ORNL and they've given April as the
earliest such a discussion can occur. So at the very least, the RFC
timeout should be pushed into April or ORNL should revise their
availability for the admin discussion.


On Mon, 9 Mar 2009, Rainer Keller wrote:

> What: Move BTLs into separate layer
> Why: Several projects have expressed interest to use the BTLs. Use-cases
> such as the RTE using the BTLs for modex or tools collecting/distributing data
> in the fastest possible way may be possible.
> Where: This would affect several components, that the BTLs depend on
> (namely allocator, mpool, rcache and the common part of the BTLs).
> Additionally some changes to classes were/are necessary.
> When: Preferably 1.5 (in case we use the Feature/Stable Release cycle ;-)
> Timeout: 23.03.2009
> ------------------------------------------------------------------------
> There has been much speculation about this project.
> This RFC should shed some light, if there is some more information required,
> please feel free to ask/comment. Of course, suggestions are welcome!
> The BTLs offer access to fast communication framework. Several projects have
> expressed interest to use them separate of other layers of Open MPI.
> Additionally (with further changes) BTLs maybe used within ORTE itself.
> The extraction is not easy (as was the extraction of ORTE and OMPI in the
> early stages of Open MPI?).
> In order to get as much input and be as visible as possible (e.g. in TRACS),
> the tmp-branch for this work has been set up on:
> We propose to have a separate ONET library living in onet, based on orte (see
> attached fig).
> In order to keep the diff between the trunk and the branch to a minimum
> several cleanup patches have already been applied to the trunk (e.g.
> unnecessary #include of ompi and orte header files, integration of
> ompi_bitmap_t into opal_bitmap_t, #include "*_config.h").
> Additionally a script (attached below) has been kept up-to-date (contrib/move-
> btl-into-onet), that will perform this separation on a fresh checkout of
> trunk:
> svn list
> into-onet
> This script requires several patches (see attached TAR-ball).
> Please update the variable PATCH_DIR to match the location of patches.
> ./move-btl-into-onet ompi-clean/
> # Lots of output deleted.
> cd ompi-clean/
> rm -fr ompi/mca/common/ # No two mcas called common, too bad...
> ./
> A preliminary header file is provided in onet/include/rte.h to accommodate the
> requirements of other RTEs (such as stci), that replaces selected
> functionality, as proposed by Jeff and Ralph in the Louisville meeting.
> Additionally, this header file is included before orte-header files (within
> onet)...
> By default, this does not change anything in the standard case (ORTE),
> otherwise -DHAVE_STCI, redefinitions for components orte-functionality
> required within onet is done.
> First tests have been done locally on Linux/x86_64.
> The branch compiles without warnings.
> The wrappers have been updated.
> The Intel Testsuite runs without failures:
> ./ all_tests_no_perf
> !!!Before any merge, do extensive performance tests on real machines!!!
> Initial tests on the cluster smoky, show no difference in comparison to ompi-
> trunk.
> Please see the enclosed output of NetPipe-3.7.1 run on a single node (--mca
> btl sm,self) on smoky.
> There are still some todos, to finalize this:
> - Dependencies in the onet-layer into the ompi-layer (ompi_proc_t,
> ompi_converter)
> We are working on these, and have shortly talked about the latter with
> George.
> - Better abstraction from orte / cleanups, such as modex
> If these involve code-changes (and not just "save" and non-intrusive renames),
> such as a opal_keyval-change, we will continue to write RFCs.