Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Preparations for moving the btl's
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-12-04 08:13:30


I think you got it right. And I think we're pretty good in terms of
BTL usage of ORTE and OPAL (to include the new "notifier" service that
Ralph put in recently -- what the FTB will likely eventually use, I
think...?); those interfaces and abstraction barriers are
technologically enforced. If you break the abstractions, the linker
will swiftly and unmercifully punish you. (this was exactly [one of]
the rationale that we used for splitting the code base into OPAL,
ORTE, and OMPI several years ago)

Greg has already noted on the wiki a few constants used in the BTL's
that have an OMPI_ prefix that aren't really OMPI values (e.g.,
OMPI_ENABLE_HETEROGENEOUS_SUPPORT). These come from configure (i.e.,
opal/include/opal_config.h) and were not renamed back when we split
the code base into OPAL, ORTE, and OMPI. I don't think we had a
strong reason for not renaming them -- most could probably be renamed
to OPAL_* -- we just didn't do it then. Perhaps they can be changed
during the BTL extraction process (I noted this on the wiki).

On Dec 3, 2008, at 9:43 PM, Richard Graham wrote:

> BTW,
> I was guessing FTB is Fault Tolerant Backbone, but if not, can
> someone tell me what it is ? If it is not the later, what I just
> wrote about it makes no sense.
>
> Rich
>
>
> On 12/3/08 9:34 PM, "Richard Graham" <rlgraham_at_[hidden]> wrote:
>
>> The goal is to use the btl’s outside of the context of MPI, which
>> was what was in mind from the day the ompi work started over five
>> years ago, but with no other use at the time, things grew up
>> intermingled – no surprise at all. What we are attempting to do is
>> to untangle the existing dependencies, and make a much cleaner
>> distinction between how/what data is passed between layers.
>>
>> I expect this will involve some sort of well defined interface
>> between the btl’s and orte, and I don’t know if this will also
>> require something like this between the btl’s and the pml – I think
>> that interface is rigidly enforced, but am not sure.
>>
>> I expect that explicit calls to FTB in the btl layer would have to
>> be componentized, especially in the context of what is developing
>> in the FT working group of the MPI Forum. Not that FTB is bad in
>> any way, just that it is one of many monitors.
>>
>> We will need to talk about this on a case by case basis, and decide
>> how to proceed. If anyone wants to help, please do.
>>
>> Rich
>>
>>
>> On 12/3/08 3:02 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>>
>>> I managed to execute the modex-less changes pretty much without
>>> introducing additional ORTE dependencies into the BTL's, though
>>> there
>>> may be some additions as we look a the other BTLs that I didn't
>>> address. So hopefully that won't contribute too much to the issue
>>> here.
>>>
>>> At the moment, I don't think it matters where notifier sits - it
>>> might
>>> be able to move to OPAL. Only catch will be if some notifier
>>> component
>>> requires communications. I'm thinking of FTB, for example, and our
>>> own
>>> local monitoring program that may require TCP messaging. We don't
>>> currently have anything in OPAL that would support an OPAL level
>>> messaging system, though perhaps that could be resolved.
>>>
>>> We also have dependencies where the BTL's will call orte_ess to find
>>> out what node another proc is on, the node local rank of that proc,
>>> etc. Those dependencies are likely to grow after the Dec meeting
>>> (see
>>> wiki for that agenda item), and definitely cannot be moved to OPAL.
>>>
>>> However, note that Rich stated the BTL's were -not- moving to OPAL.
>>> This begs the question: where -are- they going? Into their own
>>> layer?
>>> Will that layer be somewhere in-between OMPI and ORTE (in which
>>> case,
>>> the ORTE dependencies are moot)?
>>>
>>> I note that the wiki page doesn't address any of these questions,
>>> which is understandable if things are just getting underway. But it
>>> does sound like this is going to take some thought to ensure we
>>> don't
>>> paint ourselves into a corner.
>>>
>>> Ralph
>>>
>>>
>>> On Dec 3, 2008, at 12:10 PM, Jeff Squyres wrote:
>>>
>>> > FWIW, I see lots of notifier calls being added to the BTLs (and
>>> > elsewhere throughout the OMPI code base) over time...
>>> >
>>> > On Dec 3, 2008, at 2:07 PM, Tim Mattox wrote:
>>> >
>>> >> The BTLs might have added calls to the notifier framework in
>>> their
>>> >> error paths.
>>> >> The notifier framework is currently in the ORTE layer... not sure
>>> >> if we could
>>> >> move it down to OPAL. Ralph, any thoughts on that?
>>> >>
>>> >> On Wed, Dec 3, 2008 at 11:56 AM, Richard Graham <rlgraham_at_[hidden]
>>> >
>>> >> wrote:
>>> >>> George told me about what he is doing, so no changes would be
>>> >>> committed
>>> >>> until George has his changes in.
>>> >>>
>>> >>> Are there other changes to the btl's that we should be aware
>>> of ?
>>> >>>
>>> >>> Rich
>>> >>>
>>> >>>
>>> >>> On 12/3/08 11:47 AM, "George Bosilca" <bosilca_at_[hidden]>
>>> wrote:
>>> >>>
>>> >>>> Terry,
>>> >>>>
>>> >>>> I'm involved [at some degree] in both efforts and I can confirm
>>> >>>> these
>>> >>>> two efforts will not affect each other in any bad way.
>>> >>>>
>>> >>>> george.
>>> >>>>
>>> >>>> On Dec 3, 2008, at 11:42 , Terry Dontje wrote:
>>> >>>>
>>> >>>>> I don't have any *strong* objections. However, I know that
>>> Eugene
>>> >>>>> and George B have been working on some Fastpath code changes
>>> >>>>> that we
>>> >>>>> should make sure neither project obliterates the other.
>>> >>>>>
>>> >>>>> --td
>>> >>>>>
>>> >>>>> Richard Graham wrote:
>>> >>>>>> Now that 1.3 will be released, we would like to go ahead
>>> with the
>>> >>>>>> plan to move the btl’s out of the MPI layer. Greg Koenig
>>> who is
>>> >>>>>> doing most of the work has started a wiki page with details
>>> on
>>> >>>>>> the
>>> >>>>>> plans. Right now details are sketchy, as Greg is digging
>>> through
>>> >>>>>> the code, and has only hand written notes on data
>>> structures that
>>> >>>>>> need to be moved, include files that are not needed, etc. The
>>> >>>>>> page
>>> >>>>>> is at:
>>> >>>>>> _https://svn.open-mpi.org/trac/ompi/wiki/BTLExtraction_
>>> >>>>>>
>>> >>>>>> The first three steps basically only involve code motion,
>>> moving
>>> >>>>>> items such as ompi_list, and renaming them, moving where
>>> the code
>>> >>>>>> is actually located in the repository, and the like. For
>>> these we
>>> >>>>>> do not plan to put out a formal RFC, but comments are very
>>> >>>>>> welcome,
>>> >>>>>> and any hands that are willing to help with this are even
>>> more
>>> >>>>>> welcome.
>>> >>>>>>
>>> >>>>>> The last phase where the btl’s are made dependent on OPAL,
>>> and
>>> >>>>>> supporting libraries such as mpools I expect will be
>>> disruptive,
>>> >>>>>> and will definitely require an RFC, and will also be a longer
>>> >>>>>> process.
>>> >>>>>>
>>> >>>>>> Please send comments,
>>> >>>>>> Rich
>>> >>>>>>
>>> ------------------------------------------------------------------------
>>> >>>>>>
>>> >>>>>> _______________________________________________
>>> >>>>>> devel mailing list
>>> >>>>>> devel_at_[hidden]
>>> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >>>>>>
>>> >>>>>
>>> >>>>> _______________________________________________
>>> >>>>> devel mailing list
>>> >>>>> devel_at_[hidden]
>>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >>>>
>>> >>>>
>>> >>>> _______________________________________________
>>> >>>> devel mailing list
>>> >>>> devel_at_[hidden]
>>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> devel mailing list
>>> >>> devel_at_[hidden]
>>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
>>> >> tmattox_at_[hidden] || timattox_at_[hidden]
>>> >> I'm a bright... http://www.the-brights.net/
>>> >>
>>> >> _______________________________________________
>>> >> devel mailing list
>>> >> devel_at_[hidden]
>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >
>>> >
>>> > --
>>> > Jeff Squyres
>>> > Cisco Systems
>>> >
>>> >
>>> > _______________________________________________
>>> > devel mailing list
>>> > devel_at_[hidden]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems