Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Preparations for moving the btl's
From: Richard Graham (rlgraham_at_[hidden])
Date: 2008-12-03 21:43:18


BTW,
  I was guessing FTB is Fault Tolerant Backbone, but if not, can someone
tell me what it is ? If it is not the later, what I just wrote about it
makes no sense.

Rich

On 12/3/08 9:34 PM, "Richard Graham" <rlgraham_at_[hidden]> wrote:

> The goal is to use the btl¹s outside of the context of MPI, which was what was
> in mind from the day the ompi work started over five years ago, but with no
> other use at the time, things grew up intermingled ­ no surprise at all. What
> we are attempting to do is to untangle the existing dependencies, and make a
> much cleaner distinction between how/what data is passed between layers.
>
> I expect this will involve some sort of well defined interface between the
> btl¹s and orte, and I don¹t know if this will also require something like this
> between the btl¹s and the pml ­ I think that interface is rigidly enforced,
> but am not sure.
>
> I expect that explicit calls to FTB in the btl layer would have to be
> componentized, especially in the context of what is developing in the FT
> working group of the MPI Forum. Not that FTB is bad in any way, just that it
> is one of many monitors.
>
> We will need to talk about this on a case by case basis, and decide how to
> proceed. If anyone wants to help, please do.
>
> Rich
>
>
> On 12/3/08 3:02 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>
>> I managed to execute the modex-less changes pretty much without
>> introducing additional ORTE dependencies into the BTL's, though there
>> may be some additions as we look a the other BTLs that I didn't
>> address. So hopefully that won't contribute too much to the issue here.
>>
>> At the moment, I don't think it matters where notifier sits - it might
>> be able to move to OPAL. Only catch will be if some notifier component
>> requires communications. I'm thinking of FTB, for example, and our own
>> local monitoring program that may require TCP messaging. We don't
>> currently have anything in OPAL that would support an OPAL level
>> messaging system, though perhaps that could be resolved.
>>
>> We also have dependencies where the BTL's will call orte_ess to find
>> out what node another proc is on, the node local rank of that proc,
>> etc. Those dependencies are likely to grow after the Dec meeting (see
>> wiki for that agenda item), and definitely cannot be moved to OPAL.
>>
>> However, note that Rich stated the BTL's were -not- moving to OPAL.
>> This begs the question: where -are- they going? Into their own layer?
>> Will that layer be somewhere in-between OMPI and ORTE (in which case,
>> the ORTE dependencies are moot)?
>>
>> I note that the wiki page doesn't address any of these questions,
>> which is understandable if things are just getting underway. But it
>> does sound like this is going to take some thought to ensure we don't
>> paint ourselves into a corner.
>>
>> Ralph
>>
>>
>> On Dec 3, 2008, at 12:10 PM, Jeff Squyres wrote:
>>
>>> > FWIW, I see lots of notifier calls being added to the BTLs (and
>>> > elsewhere throughout the OMPI code base) over time...
>>> >
>>> > On Dec 3, 2008, at 2:07 PM, Tim Mattox wrote:
>>> >
>>>> >> The BTLs might have added calls to the notifier framework in their
>>>> >> error paths.
>>>> >> The notifier framework is currently in the ORTE layer... not sure
>>>> >> if we could
>>>> >> move it down to OPAL. Ralph, any thoughts on that?
>>>> >>
>>>> >> On Wed, Dec 3, 2008 at 11:56 AM, Richard Graham <rlgraham_at_[hidden]>
>>>> >> wrote:
>>>>> >>> George told me about what he is doing, so no changes would be
>>>>> >>> committed
>>>>> >>> until George has his changes in.
>>>>> >>>
>>>>> >>> Are there other changes to the btl's that we should be aware of ?
>>>>> >>>
>>>>> >>> Rich
>>>>> >>>
>>>>> >>>
>>>>> >>> On 12/3/08 11:47 AM, "George Bosilca" <bosilca_at_[hidden]> wrote:
>>>>> >>>
>>>>>> >>>> Terry,
>>>>>> >>>>
>>>>>> >>>> I'm involved [at some degree] in both efforts and I can confirm
>>>>>> >>>> these
>>>>>> >>>> two efforts will not affect each other in any bad way.
>>>>>> >>>>
>>>>>> >>>> george.
>>>>>> >>>>
>>>>>> >>>> On Dec 3, 2008, at 11:42 , Terry Dontje wrote:
>>>>>> >>>>
>>>>>>> >>>>> I don't have any *strong* objections. However, I know that Eugene
>>>>>>> >>>>> and George B have been working on some Fastpath code changes
>>>>>>> >>>>> that we
>>>>>>> >>>>> should make sure neither project obliterates the other.
>>>>>>> >>>>>
>>>>>>> >>>>> --td
>>>>>>> >>>>>
>>>>>>> >>>>> Richard Graham wrote:
>>>>>>>> >>>>>> Now that 1.3 will be released, we would like to go ahead with
the
>>>>>>>> >>>>>> plan to move the btl¹s out of the MPI layer. Greg Koenig who is
>>>>>>>> >>>>>> doing most of the work has started a wiki page with details on
>>>>>>>> >>>>>> the
>>>>>>>> >>>>>> plans. Right now details are sketchy, as Greg is digging through
>>>>>>>> >>>>>> the code, and has only hand written notes on data structures
that
>>>>>>>> >>>>>> need to be moved, include files that are not needed, etc. The
>>>>>>>> >>>>>> page
>>>>>>>> >>>>>> is at:
>>>>>>>> >>>>>> _https://svn.open-mpi.org/trac/ompi/wiki/BTLExtraction_
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> The first three steps basically only involve code motion, moving
>>>>>>>> >>>>>> items such as ompi_list, and renaming them, moving where the
code
>>>>>>>> >>>>>> is actually located in the repository, and the like. For these
we
>>>>>>>> >>>>>> do not plan to put out a formal RFC, but comments are very
>>>>>>>> >>>>>> welcome,
>>>>>>>> >>>>>> and any hands that are willing to help with this are even more
>>>>>>>> >>>>>> welcome.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> The last phase where the btl¹s are made dependent on OPAL, and
>>>>>>>> >>>>>> supporting libraries such as mpools I expect will be disruptive,
>>>>>>>> >>>>>> and will definitely require an RFC, and will also be a longer
>>>>>>>> >>>>>> process.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Please send comments,
>>>>>>>> >>>>>> Rich
>>>>>>>> >>>>>>
>>>>>>>>
------------------------------------------------------------------------
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> _______________________________________________
>>>>>>>> >>>>>> devel mailing list
>>>>>>>> >>>>>> devel_at_[hidden]
>>>>>>>> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> >>>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> _______________________________________________
>>>>>>> >>>>> devel mailing list
>>>>>>> >>>>> devel_at_[hidden]
>>>>>>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> _______________________________________________
>>>>>> >>>> devel mailing list
>>>>>> >>>> devel_at_[hidden]
>>>>>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> >>>
>>>>> >>>
>>>>> >>> _______________________________________________
>>>>> >>> devel mailing list
>>>>> >>> devel_at_[hidden]
>>>>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
>>>> >> tmattox_at_[hidden] || timattox_at_[hidden]
>>>> >> I'm a bright... http://www.the-brights.net/
>>>> >>
>>>> >> _______________________________________________
>>>> >> devel mailing list
>>>> >> devel_at_[hidden]
>>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >
>>> >
>>> > --
>>> > Jeff Squyres
>>> > Cisco Systems
>>> >
>>> >
>>> > _______________________________________________
>>> > devel mailing list
>>> > devel_at_[hidden]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel