Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Preparations for moving the btl's
From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2008-12-04 12:47:37


That was my thought exactly. And since the point of the notifier
component is to return a *useful* description of what failure the BTL had
(like IB ran out of resource X again), that will be lost if we just push
that up to the next layer.

Just my $0.02, of course.

Brian

On Thu, 4 Dec 2008, Ralph Castain wrote:

> Hmmm...only problem with that idea is that the entity being communicated
> to (e.g., SLURM, Moab) have no concept of MPI nor any way to communicate
> via that system. They do, however, have APIs that notifier can call, and
> know how to speak TCP via their own agreed-upon protocols. And many large
> systems turn off the TCP btl (all of ours, for example) because it isn't
> needed and opens additional unnecessary ports.
> So calling APIs and/or sending messages across the OOB are pretty
> straight forward. Teaching Moab to understand btl/datatype engine
> messages (flowing across who knows what transport) is an unlikely thing
> to happen.
>
> Besides, one of the primary reasons for needing to call notifier is a
> failure in the btl - so relying on the btl to send the message is
> self-defeating.
>
>
> On Dec 4, 2008, at 10:37 AM, Richard Graham wrote:
>
> Here is where I think we should reconsider accessing the
> notifier component in the btl.  It creates dependencies in
> the btl that are not needed.  The idea of a notifier
> component is a good one, but I would defer using it to upper
> layers, rather than embedding it in the guts of the
> communication system.  I would be in favor of an approach
> that sends the information up the call stack.  The btl?s should
> not depend on other communication primitives, as they are the
> communication primitive.
>
> Rich
>
>
> On 12/4/08 9:04 AM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>
> Yes, FTB utilizes the notifier framework. In
> addition, we have three
> other components getting ready to be added to
> that framework that will
> provide interfaces to Moab, SLURM, and a DOE
> monitoring program. The
> first two will require messaging capabilities to
> tell the schedulers
> about problem nodes/routes. The latter will also
> use a messaging
> protocol, but is mostly aimed at alerting
> operators to a problem and
> creating a historical archive.
>
>   That said, we can expect the use of
> orte_notifier to spread across
> the BTL's pretty aggressively in the next few
> months, and for the
> notifier API to change/expand as we address these
> needs.
>
> On Dec 4, 2008, at 6:13 AM, Jeff Squyres wrote:
>
> > I think you got it right.  And I think we're
> pretty good in terms of
> > BTL usage of ORTE and OPAL (to include the new
> "notifier" service
> > that Ralph put in recently -- what the FTB will
> likely eventually
> > use, I think...?); those interfaces and
> abstraction barriers are
> > technologically enforced.  If you break the
> abstractions, the linker
> > will swiftly and unmercifully punish you.
>  (this was exactly [one
> > of] the rationale that we used for splitting
> the code base into
> > OPAL, ORTE, and OMPI several years ago)
> >
> > Greg has already noted on the wiki a few
> constants used in the BTL's
> > that have an OMPI_ prefix that aren't really
> OMPI values (e.g.,
> > OMPI_ENABLE_HETEROGENEOUS_SUPPORT).  These come
> from configure
> > (i.e., opal/include/opal_config.h) and were not
> renamed back when we
> > split the code base into OPAL, ORTE, and OMPI.
>  I don't think we had
> > a strong reason for not renaming them -- most
> could probably be
> > renamed to OPAL_* -- we just didn't do it then.
>  Perhaps they can be
> > changed during the BTL extraction process (I
> noted this on the wiki).
> >
> >
> >
> > On Dec 3, 2008, at 9:43 PM, Richard Graham
> wrote:
> >
> >> BTW,
> >>  I was guessing FTB is Fault Tolerant
> Backbone, but if not, can
> >> someone tell me what it is ?  If it is not the
> later, what I just
> >> wrote about it makes no sense.
> >>
> >> Rich
> >>
> >>
> >> On 12/3/08 9:34 PM, "Richard Graham"
> <rlgraham_at_[hidden]> wrote:
> >>
> >>> The goal is to use the btl?s outside of the
> context of MPI, which
> >>> was what was in mind from the day the ompi
> work started over five
> >>> years ago, but with no other use at the time,
> things grew up
> >>> intermingled ? no surprise at all.  What we are
> attempting to do
> >>> is to untangle the existing dependencies, and
> make a much cleaner
> >>> distinction between how/what data is passed
> between layers.
> >>>
> >>> I expect this will involve some sort of well
> defined interface
> >>> between the btl?s and orte, and I don?t know if
> this will also
> >>> require something like this between the btl?s
> and the pml ? I
> >>> think that interface is rigidly enforced, but
> am not sure.
> >>>
> >>> I expect that explicit calls to FTB in the
> btl layer would have to
> >>> be componentized, especially in the context
> of what is developing
> >>> in the FT working group of the MPI Forum.
>  Not that FTB is bad in
> >>> any way, just that it is one of many
> monitors.
> >>>
> >>> We will need to talk about this on a case by
> case basis, and
> >>> decide how to proceed.  If anyone wants to
> help, please do.
> >>>
> >>> Rich
> >>>
> >>>
> >>> On 12/3/08 3:02 PM, "Ralph Castain"
> <rhc_at_[hidden]> wrote:
> >>>
> >>>> I managed to execute the modex-less changes
> pretty much without
> >>>> introducing additional ORTE dependencies
> into the BTL's, though
> >>>> there
> >>>> may be some additions as we look a the other
> BTLs that I didn't
> >>>> address. So hopefully that won't contribute
> too much to the issue
> >>>> here.
> >>>>
> >>>> At the moment, I don't think it matters
> where notifier sits - it
> >>>> might
> >>>> be able to move to OPAL. Only catch will be
> if some notifier
> >>>> component
> >>>> requires communications. I'm thinking of
> FTB, for example, and
> >>>> our own
> >>>> local monitoring program that may require
> TCP messaging. We don't
> >>>> currently have anything in OPAL that would
> support an OPAL level
> >>>> messaging system, though perhaps that could
> be resolved.
> >>>>
> >>>> We also have dependencies where the BTL's
> will call orte_ess to
> >>>> find
> >>>> out what node another proc is on, the node
> local rank of that proc,
> >>>> etc. Those dependencies are likely to grow
> after the Dec meeting
> >>>> (see
> >>>> wiki for that agenda item), and definitely
> cannot be moved to OPAL.
> >>>>
> >>>> However, note that Rich stated the BTL's
> were -not- moving to OPAL.
> >>>> This begs the question: where -are- they
> going? Into their own
> >>>> layer?
> >>>> Will that layer be somewhere in-between OMPI
> and ORTE (in which
> >>>> case,
> >>>> the ORTE dependencies are moot)?
> >>>>
> >>>> I note that the wiki page doesn't address
> any of these questions,
> >>>> which is understandable if things are just
> getting underway. But it
> >>>> does sound like this is going to take some
> thought to ensure we
> >>>> don't
> >>>> paint ourselves into a corner.
> >>>>
> >>>> Ralph
> >>>>
> >>>>
> >>>> On Dec 3, 2008, at 12:10 PM, Jeff Squyres
> wrote:
> >>>>
> >>>> > FWIW, I see lots of notifier calls being
> added to the BTLs (and
> >>>> > elsewhere throughout the OMPI code base)
> over time...
> >>>> >
> >>>> > On Dec 3, 2008, at 2:07 PM, Tim Mattox
> wrote:
> >>>> >
> >>>> >> The BTLs might have added calls to the
> notifier framework in
> >>>> their
> >>>> >> error paths.
> >>>> >> The notifier framework is currently in
> the ORTE layer... not
> >>>> sure
> >>>> >> if we could
> >>>> >> move it down to OPAL.  Ralph, any
> thoughts on that?
> >>>> >>
> >>>> >> On Wed, Dec 3, 2008 at 11:56 AM, Richard
> Graham <rlgraham_at_[hidden]
> >>>> >
> >>>> >> wrote:
> >>>> >>> George told me about what he is doing,
> so no changes would be
> >>>> >>> committed
> >>>> >>> until George has his changes in.
> >>>> >>>
> >>>> >>> Are there other changes to the btl's
> that we should be aware
> >>>> of ?
> >>>> >>>
> >>>> >>> Rich
> >>>> >>>
> >>>> >>>
> >>>> >>> On 12/3/08 11:47 AM, "George Bosilca"
> <bosilca_at_[hidden]>
> >>>> wrote:
> >>>> >>>
> >>>> >>>> Terry,
> >>>> >>>>
> >>>> >>>> I'm involved [at some degree] in both
> efforts and I can
> >>>> confirm
> >>>> >>>> these
> >>>> >>>> two efforts will not affect each other
> in any bad way.
> >>>> >>>>
> >>>> >>>>  george.
> >>>> >>>>
> >>>> >>>> On Dec 3, 2008, at 11:42 , Terry Dontje
> wrote:
> >>>> >>>>
> >>>> >>>>> I don't have any *strong* objections.
> However, I know that
> >>>> Eugene
> >>>> >>>>> and George B have been working on some
> Fastpath code changes
> >>>> >>>>> that we
> >>>> >>>>> should make sure neither project
> obliterates the other.
> >>>> >>>>>
> >>>> >>>>> --td
> >>>> >>>>>
> >>>> >>>>> Richard Graham wrote:
> >>>> >>>>>> Now that 1.3 will be released, we
> would like to go ahead
> >>>> with the
> >>>> >>>>>> plan to move the btl?s out of the MPI
> layer. Greg Koenig
> >>>> who is
> >>>> >>>>>> doing most of the work has started a
> wiki page with
> >>>> details on
> >>>> >>>>>> the
> >>>> >>>>>> plans. Right now details are sketchy,
> as Greg is digging
> >>>> through
> >>>> >>>>>> the code, and has only hand written
> notes on data
> >>>> structures that
> >>>> >>>>>> need to be moved, include files that
> are not needed, etc.
> >>>> The
> >>>> >>>>>> page
> >>>> >>>>>> is at:
> >>>> >>>>>>
> _https://svn.open-mpi.org/trac/ompi/wiki/BTLExtraction_
> >>>> >>>>>>
> >>>> >>>>>> The first three steps basically only
> involve code motion,
> >>>> moving
> >>>> >>>>>> items such as ompi_list, and renaming
> them, moving where
> >>>> the code
> >>>> >>>>>> is actually located in the
> repository, and the like. For
> >>>> these we
> >>>> >>>>>> do not plan to put out a formal RFC,
> but comments are very
> >>>> >>>>>> welcome,
> >>>> >>>>>> and any hands that are willing to
> help with this are even
> >>>> more
> >>>> >>>>>> welcome.
> >>>> >>>>>>
> >>>> >>>>>> The last phase where the btl?s are made
> dependent on OPAL,
> >>>> and
> >>>> >>>>>> supporting libraries such as mpools I
> expect will be
> >>>> disruptive,
> >>>> >>>>>> and will definitely require an RFC,
> and will also be a
> >>>> longer
> >>>> >>>>>> process.
> >>>> >>>>>>
> >>>> >>>>>> Please send comments,
> >>>> >>>>>> Rich
> >>>> >>>>>>
> >>>>
> ------------------------------------------------------------------------
> >>>> >>>>>>
> >>>> >>>>>>
> _______________________________________________
> >>>> >>>>>> devel mailing list
> >>>> >>>>>> devel_at_[hidden]
> >>>> >>>>>>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>> >>>>>>
> >>>> >>>>>
> >>>> >>>>>
> _______________________________________________
> >>>> >>>>> devel mailing list
> >>>> >>>>> devel_at_[hidden]
> >>>> >>>>>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>> >>>>
> >>>> >>>>
> >>>> >>>>
> _______________________________________________
> >>>> >>>> devel mailing list
> >>>> >>>> devel_at_[hidden]
> >>>> >>>>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>> >>>
> >>>> >>>
> >>>> >>>
> _______________________________________________
> >>>> >>> devel mailing list
> >>>> >>> devel_at_[hidden]
> >>>> >>>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>> >>>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> --
> >>>> >> Tim Mattox, Ph.D. -
> http://homepage.mac.com/tmattox/
> >>>> >> tmattox_at_[hidden] ||
> timattox_at_[hidden]
> >>>> >>   I'm a bright...
> http://www.the-brights.net/
> >>>> >>
> >>>> >>
> _______________________________________________
> >>>> >> devel mailing list
> >>>> >> devel_at_[hidden]
> >>>> >>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Jeff Squyres
> >>>> > Cisco Systems
> >>>> >
> >>>> >
> >>>> >
> _______________________________________________
> >>>> > devel mailing list
> >>>> > devel_at_[hidden]
> >>>> >
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>>
> >>>>
> _______________________________________________
> >>>> devel mailing list
> >>>> devel_at_[hidden]
> >>>>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>
> >>>
> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> > --
> > Jeff Squyres
> > Cisco Systems
> >
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> >
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
>