Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: move BTLs out of ompi into separate layer
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-03-09 15:55:44

On Mar 9, 2009, at 15:13 , Ralph Castain wrote:

> Could you please clarify - what is going to happen on Mar 23 (your
> timeout date)?
> It also wasn't clear about your testing. Are you calling up into the
> ONET layer to run it from the RTE? I believe this was the point of
> concern regarding performance - what impact it would have on things
> if we enabled that capability. For example, do you need a quality-of-
> service flag to avoid delaying MPI comm when an RTE comm occurs?
> When someone calls into the MPI library, are you going to first
> progress MPI messages, and then if-and-only-if they all complete,
> progress any RTE ONET messages? How will this be handled?

I will emphasize a very important point, deeply discussed during our
meeting in Louisville: there is no support for ONET in ORTE at this
stage, and no plan for such a support has been presented so far (and I
doubt there will be in a near future). Moreover, as the ONET layer is
agnostic to any semantic the upper layer will give to the data moved
around, there is no reason to enforce any "quality-of-services" in the
ONET layer. Whatever upper layer will be implemented on top of this
ONET BTL will have to deal with such things.

> I doubt anyone expects to see a performance impact of doing nothing
> but renaming things, so these questions remain at the heart of the
> discussion.

I hope we will not spend our time during tomorrow morning phone call
talking about name changes ...

> Also, could you clarify what happened to the datatype engine? Is
> this moving to OPAL, ONET, or...?

Again, as discussed during the meeting in Louisville, at the end of
the move the datatype engine will be divided in two parts: one MPI
agnostic only able to deal with common predefined types, and one
encapsulating the MPI knowledge (how to build an indexed array or any
other MPI fancy type). The first one will be moved in the OPAL layer,
while the second one will stay where it is today (i.e. the MPI layer).


> Thanks
> Ralph
> On Mar 9, 2009, at 12:51 PM, Rainer Keller wrote:
>> What: Move BTLs into separate layer
>> Why: Several projects have expressed interest to use the BTLs.
>> Use-cases
>> such as the RTE using the BTLs for modex or tools collecting/
>> distributing data
>> in the fastest possible way may be possible.
>> Where: This would affect several components, that the BTLs
>> depend on
>> (namely allocator, mpool, rcache and the common part of the BTLs).
>> Additionally some changes to classes were/are necessary.
>> When: Preferably 1.5 (in case we use the Feature/Stable Release
>> cycle ;-)
>> Timeout: 23.03.2009
>> ------------------------------------------------------------------------
>> There has been much speculation about this project.
>> This RFC should shed some light, if there is some more information
>> required,
>> please feel free to ask/comment. Of course, suggestions are welcome!
>> The BTLs offer access to fast communication framework. Several
>> projects have
>> expressed interest to use them separate of other layers of Open MPI.
>> Additionally (with further changes) BTLs maybe used within ORTE
>> itself.
>> The extraction is not easy (as was the extraction of ORTE and OMPI
>> in the
>> early stages of Open MPI?).
>> In order to get as much input and be as visible as possible (e.g.
>> in TRACS),
>> the tmp-branch for this work has been set up on:
>> We propose to have a separate ONET library living in onet, based on
>> orte (see
>> attached fig).
>> In order to keep the diff between the trunk and the branch to a
>> minimum
>> several cleanup patches have already been applied to the trunk (e.g.
>> unnecessary #include of ompi and orte header files, integration of
>> ompi_bitmap_t into opal_bitmap_t, #include "*_config.h").
>> Additionally a script (attached below) has been kept up-to-date
>> (contrib/move-
>> btl-into-onet), that will perform this separation on a fresh
>> checkout of
>> trunk:
>> svn list
>> into-onet
>> This script requires several patches (see attached TAR-ball).
>> Please update the variable PATCH_DIR to match the location of
>> patches.
>> ./move-btl-into-onet ompi-clean/
>> # Lots of output deleted.
>> cd ompi-clean/
>> rm -fr ompi/mca/common/ # No two mcas called common, too bad...
>> ./
>> A preliminary header file is provided in onet/include/rte.h to
>> accommodate the
>> requirements of other RTEs (such as stci), that replaces selected
>> functionality, as proposed by Jeff and Ralph in the Louisville
>> meeting.
>> Additionally, this header file is included before orte-header files
>> (within
>> onet)...
>> By default, this does not change anything in the standard case
>> (ORTE),
>> otherwise -DHAVE_STCI, redefinitions for components orte-
>> functionality
>> required within onet is done.
>> First tests have been done locally on Linux/x86_64.
>> The branch compiles without warnings.
>> The wrappers have been updated.
>> The Intel Testsuite runs without failures:
>> ./ all_tests_no_perf
>> !!!Before any merge, do extensive performance tests on real
>> machines!!!
>> Initial tests on the cluster smoky, show no difference in
>> comparison to ompi-
>> trunk.
>> Please see the enclosed output of NetPipe-3.7.1 run on a single
>> node (--mca
>> btl sm,self) on smoky.
>> There are still some todos, to finalize this:
>> - Dependencies in the onet-layer into the ompi-layer (ompi_proc_t,
>> ompi_converter)
>> We are working on these, and have shortly talked about the latter
>> with
>> George.
>> - Better abstraction from orte / cleanups, such as modex
>> If these involve code-changes (and not just "save" and non-
>> intrusive renames),
>> such as a opal_keyval-change, we will continue to write RFCs.
>> --
>> ------------------------------------------------------------------------
>> Rainer Keller, PhD Tel: +1 (865) 241-6293
>> Oak Ridge National Lab Fax: +1 (865) 241-4811
>> PO Box 2008 MS 6164 Email: keller_at_[hidden]
>> Oak Ridge, TN 37831-2008 AIM/Skype: rusraink
>> <ompi_onet-2009.02.27.pdf><move-btl-patches.tar><move-btl-into-
>> onet><NPmpi-ompi.out><NPmpi-koenig-BTL-
>> orte.out>_______________________________________________
>> devel mailing list
>> devel_at_[hidden]
> _______________________________________________
> devel mailing list
> devel_at_[hidden]