I guess you lost me on this one. How are the btl's going to push an error "up" to a higher layer? The errors could contain an arbitrary amount of information in them. Since the btl API's currently only return ints, are you proposing that we change all the btl APIs to include a new error structure so we can pass detailed error information back to the caller?

Then the MPI layer would have to call the orte_notifier with the appropriate info, since the MPI layer doesn't have the necessary communications infrastructure itself to perform the required functions. This would mean that every place that calls the BTL's would have to deal with the new API and returned error structure, and call orte_notifier if an error was reported.

Seems like this would proliferate quickly, while having the error reporting mechanism right where the error occurs represents the minimal impact and maximum flexibility.


On Dec 4, 2008, at 12:07 PM, Richard Graham wrote:

Not exactly, it depends on what you push up the stack. If you push just an error code, than you are right, there is very little value.  However, if you push up the error strings (or something like that), and have an upper layer interact with SLURM or Moabís error reporting system, the btlís donít need to learn about and depend on a new interface.

Rich


On 12/4/08 12:47 PM, "Brian W. Barrett" <brbarret@open-mpi.org> wrote:

That was my thought exactly.  And since the point of the notifier
component is to return a *useful* description of what failure the BTL had
(like IB ran out of resource X again), that will be lost if we just push
that up to the next layer.

Just my $0.02, of course.

Brian

On Thu, 4 Dec 2008, Ralph Castain wrote:

> Hmmm...only problem with that idea is that the entity being communicated
> to (e.g., SLURM, Moab) have no concept of MPI nor any way to communicate
> via that system. They do, however, have APIs that notifier can call, and
> know how to speak TCP via their own agreed-upon protocols. And many large
> systems turn off the TCP btl (all of ours, for example) because it isn't
> needed and opens additional unnecessary ports.
> So calling APIs and/or sending messages across the OOB are pretty
> straight forward. Teaching Moab to understand btl/datatype engine
> messages (flowing across who knows what transport) is an unlikely thing
> to happen.
>
> Besides, one of the primary reasons for needing to call notifier is a
> failure in the btl - so relying on the btl to send the message is
> self-defeating.
>
>
> On Dec 4, 2008, at 10:37 AM, Richard Graham wrote:
>
>       Here is where I think we should reconsider accessing the
>       notifier component in the btl.  It creates dependencies in
>       the btl that are not needed.  The idea of a notifier
>       component is a good one, but I would defer using it to upper
>       layers, rather than embedding it in the guts of the
>       communication system.  I would be in favor of an approach
>       that sends the information up the call stack.  The btl?s should
>       not depend on other communication primitives, as they are the
>       communication primitive.
>
>       Rich
>
>
>       On 12/4/08 9:04 AM, "Ralph Castain" <rhc@lanl.gov> wrote:
>
>             Yes, FTB utilizes the notifier framework. In
>             addition, we have three
>             other components getting ready to be added to
>             that framework that will
>             provide interfaces to Moab, SLURM, and a DOE
>             monitoring program. The
>             first two will require messaging capabilities to
>             tell the schedulers
>             about problem nodes/routes. The latter will also
>             use a messaging
>             protocol, but is mostly aimed at alerting
>             operators to a problem and
>             creating a historical archive.
>
>               That said, we can expect the use of
>             orte_notifier to spread across
>             the BTL's pretty aggressively in the next few
>             months, and for the
>             notifier API to change/expand as we address these
>             needs.
>
>             On Dec 4, 2008, at 6:13 AM, Jeff Squyres wrote:
>
>             > I think you got it right.  And I think we're
>             pretty good in terms of
>             > BTL usage of ORTE and OPAL (to include the new
>             "notifier" service
>             > that Ralph put in recently -- what the FTB will
>             likely eventually
>             > use, I think...?); those interfaces and
>             abstraction barriers are
>             > technologically enforced.  If you break the
>             abstractions, the linker
>             > will swiftly and unmercifully punish you.
>              (this was exactly [one
>             > of] the rationale that we used for splitting
>             the code base into
>             > OPAL, ORTE, and OMPI several years ago)
>             >
>             > Greg has already noted on the wiki a few
>             constants used in the BTL's
>             > that have an OMPI_ prefix that aren't really
>             OMPI values (e.g.,
>             > OMPI_ENABLE_HETEROGENEOUS_SUPPORT).  These come
>             from configure
>             > (i.e., opal/include/opal_config.h) and were not
>             renamed back when we
>             > split the code base into OPAL, ORTE, and OMPI.
>              I don't think we had
>             > a strong reason for not renaming them -- most
>             could probably be
>             > renamed to OPAL_* -- we just didn't do it then.
>              Perhaps they can be
>             > changed during the BTL extraction process (I
>             noted this on the wiki).
>             >
>             >
>             >
>             > On Dec 3, 2008, at 9:43 PM, Richard Graham
>             wrote:
>             >
>             >> BTW,
>             >>  I was guessing FTB is Fault Tolerant
>             Backbone, but if not, can
>             >> someone tell me what it is ?  If it is not the
>             later, what I just
>             >> wrote about it makes no sense.
>             >>
>             >> Rich
>             >>
>             >>
>             >> On 12/3/08 9:34 PM, "Richard Graham"
>             <rlgraham@ornl.gov> wrote:
>             >>
>             >>> The goal is to use the btl?s outside of the
>             context of MPI, which
>             >>> was what was in mind from the day the ompi
>             work started over five
>             >>> years ago, but with no other use at the time,
>             things grew up
>             >>> intermingled ? no surprise at all.  What we are
>             attempting to do
>             >>> is to untangle the existing dependencies, and
>             make a much cleaner
>             >>> distinction between how/what data is passed
>             between layers.
>             >>>
>             >>> I expect this will involve some sort of well
>             defined interface
>             >>> between the btl?s and orte, and I don?t know if
>             this will also
>             >>> require something like this between the btl?s
>             and the pml ? I
>             >>> think that interface is rigidly enforced, but
>             am not sure.
>             >>>
>             >>> I expect that explicit calls to FTB in the
>             btl layer would have to
>             >>> be componentized, especially in the context
>             of what is developing
>             >>> in the FT working group of the MPI Forum.
>              Not that FTB is bad in
>             >>> any way, just that it is one of many
>             monitors.
>             >>>
>             >>> We will need to talk about this on a case by
>             case basis, and
>             >>> decide how to proceed.  If anyone wants to
>             help, please do.
>             >>>
>             >>> Rich
>             >>>
>             >>>
>             >>> On 12/3/08 3:02 PM, "Ralph Castain"
>             <rhc@lanl.gov> wrote:
>             >>>
>             >>>> I managed to execute the modex-less changes
>             pretty much without
>             >>>> introducing additional ORTE dependencies
>             into the BTL's, though
>             >>>> there
>             >>>> may be some additions as we look a the other
>             BTLs that I didn't
>             >>>> address. So hopefully that won't contribute
>             too much to the issue
>             >>>> here.
>             >>>>
>             >>>> At the moment, I don't think it matters
>             where notifier sits - it
>             >>>> might
>             >>>> be able to move to OPAL. Only catch will be
>             if some notifier
>             >>>> component
>             >>>> requires communications. I'm thinking of
>             FTB, for example, and
>             >>>> our own
>             >>>> local monitoring program that may require
>             TCP messaging. We don't
>             >>>> currently have anything in OPAL that would
>             support an OPAL level
>             >>>> messaging system, though perhaps that could
>             be resolved.
>             >>>>
>             >>>> We also have dependencies where the BTL's
>             will call orte_ess to
>             >>>> find
>             >>>> out what node another proc is on, the node
>             local rank of that proc,
>             >>>> etc. Those dependencies are likely to grow
>             after the Dec meeting
>             >>>> (see
>             >>>> wiki for that agenda item), and definitely
>             cannot be moved to OPAL.
>             >>>>
>             >>>> However, note that Rich stated the BTL's
>             were -not- moving to OPAL.
>             >>>> This begs the question: where -are- they
>             going? Into their own
>             >>>> layer?
>             >>>> Will that layer be somewhere in-between OMPI
>             and ORTE (in which
>             >>>> case,
>             >>>> the ORTE dependencies are moot)?
>             >>>>
>             >>>> I note that the wiki page doesn't address
>             any of these questions,
>             >>>> which is understandable if things are just
>             getting underway. But it
>             >>>> does sound like this is going to take some
>             thought to ensure we
>             >>>> don't
>             >>>> paint ourselves into a corner.
>             >>>>
>             >>>> Ralph
>             >>>>
>             >>>>
>             >>>> On Dec 3, 2008, at 12:10 PM, Jeff Squyres
>             wrote:
>             >>>>
>             >>>> > FWIW, I see lots of notifier calls being
>             added to the BTLs (and
>             >>>> > elsewhere throughout the OMPI code base)
>             over time...
>             >>>> >
>             >>>> > On Dec 3, 2008, at 2:07 PM, Tim Mattox
>             wrote:
>             >>>> >
>             >>>> >> The BTLs might have added calls to the
>             notifier framework in
>             >>>> their
>             >>>> >> error paths.
>             >>>> >> The notifier framework is currently in
>             the ORTE layer... not
>             >>>> sure
>             >>>> >> if we could
>             >>>> >> move it down to OPAL.  Ralph, any
>             thoughts on that?
>             >>>> >>
>             >>>> >> On Wed, Dec 3, 2008 at 11:56 AM, Richard
>             Graham <rlgraham@ornl.gov
>             >>>> >
>             >>>> >> wrote:
>             >>>> >>> George told me about what he is doing,
>             so no changes would be
>             >>>> >>> committed
>             >>>> >>> until George has his changes in.
>             >>>> >>>
>             >>>> >>> Are there other changes to the btl's
>             that we should be aware
>             >>>> of ?
>             >>>> >>>
>             >>>> >>> Rich
>             >>>> >>>
>             >>>> >>>
>             >>>> >>> On 12/3/08 11:47 AM, "George Bosilca"
>             <bosilca@eecs.utk.edu>
>             >>>> wrote:
>             >>>> >>>
>             >>>> >>>> Terry,
>             >>>> >>>>
>             >>>> >>>> I'm involved [at some degree] in both
>             efforts and I can
>             >>>> confirm
>             >>>> >>>> these
>             >>>> >>>> two efforts will not affect each other
>             in any bad way.
>             >>>> >>>>
>             >>>> >>>>  george.
>             >>>> >>>>
>             >>>> >>>> On Dec 3, 2008, at 11:42 , Terry Dontje
>             wrote:
>             >>>> >>>>
>             >>>> >>>>> I don't have any *strong* objections.
>             However, I know that
>             >>>> Eugene
>             >>>> >>>>> and George B have been working on some
>             Fastpath code changes
>             >>>> >>>>> that we
>             >>>> >>>>> should make sure neither project
>             obliterates the other.
>             >>>> >>>>>
>             >>>> >>>>> --td
>             >>>> >>>>>
>             >>>> >>>>> Richard Graham wrote:
>             >>>> >>>>>> Now that 1.3 will be released, we
>             would like to go ahead
>             >>>> with the
>             >>>> >>>>>> plan to move the btl?s out of the MPI
>             layer. Greg Koenig
>             >>>> who is
>             >>>> >>>>>> doing most of the work has started a
>             wiki page with
>             >>>> details on
>             >>>> >>>>>> the
>             >>>> >>>>>> plans. Right now details are sketchy,
>             as Greg is digging
>             >>>> through
>             >>>> >>>>>> the code, and has only hand written
>             notes on data
>             >>>> structures that
>             >>>> >>>>>> need to be moved, include files that
>             are not needed, etc.
>             >>>> The
>             >>>> >>>>>> page
>             >>>> >>>>>> is at:
>             >>>> >>>>>>
>             _https://svn.open-mpi.org/trac/ompi/wiki/BTLExtraction_
>             >>>> >>>>>>
>             >>>> >>>>>> The first three steps basically only
>             involve code motion,
>             >>>> moving
>             >>>> >>>>>> items such as ompi_list, and renaming
>             them, moving where
>             >>>> the code
>             >>>> >>>>>> is actually located in the
>             repository, and the like. For
>             >>>> these we
>             >>>> >>>>>> do not plan to put out a formal RFC,
>             but comments are very
>             >>>> >>>>>> welcome,
>             >>>> >>>>>> and any hands that are willing to
>             help with this are even
>             >>>> more
>             >>>> >>>>>> welcome.
>             >>>> >>>>>>
>             >>>> >>>>>> The last phase where the btl?s are made
>             dependent on OPAL,
>             >>>> and
>             >>>> >>>>>> supporting libraries such as mpools I
>             expect will be
>             >>>> disruptive,
>             >>>> >>>>>> and will definitely require an RFC,
>             and will also be a
>             >>>> longer
>             >>>> >>>>>> process.
>             >>>> >>>>>>
>             >>>> >>>>>> Please send comments,
>             >>>> >>>>>> Rich
>             >>>> >>>>>>
>             >>>>
>             ------------------------------------------------------------------------
>             >>>> >>>>>>
>             >>>> >>>>>>
>             _______________________________________________
>             >>>> >>>>>> devel mailing list
>             >>>> >>>>>> devel@open-mpi.org
>             >>>> >>>>>>
>             http://www.open-mpi.org/mailman/listinfo.cgi/devel
>             >>>> >>>>>>
>             >>>> >>>>>
>             >>>> >>>>>
>             _______________________________________________
>             >>>> >>>>> devel mailing list
>             >>>> >>>>> devel@open-mpi.org
>             >>>> >>>>>
>             http://www.open-mpi.org/mailman/listinfo.cgi/devel
>             >>>> >>>>
>             >>>> >>>>
>             >>>> >>>>
>             _______________________________________________
>             >>>> >>>> devel mailing list
>             >>>> >>>> devel@open-mpi.org
>             >>>> >>>>
>             http://www.open-mpi.org/mailman/listinfo.cgi/devel
>             >>>> >>>
>             >>>> >>>
>             >>>> >>>
>             _______________________________________________
>             >>>> >>> devel mailing list
>             >>>> >>> devel@open-mpi.org
>             >>>> >>>
>             http://www.open-mpi.org/mailman/listinfo.cgi/devel
>             >>>> >>>
>             >>>> >>
>             >>>> >>
>             >>>> >>
>             >>>> >> --
>             >>>> >> Tim Mattox, Ph.D. -
>             http://homepage.mac.com/tmattox/
>             >>>> >> tmattox@gmail.com ||
>             timattox@open-mpi.org
>             >>>> >>   I'm a bright...
>             http://www.the-brights.net/
>             >>>> >>
>             >>>> >>
>             _______________________________________________
>             >>>> >> devel mailing list
>             >>>> >> devel@open-mpi.org
>             >>>> >>
>             http://www.open-mpi.org/mailman/listinfo.cgi/devel
>             >>>> >
>             >>>> >
>             >>>> > --
>             >>>> > Jeff Squyres
>             >>>> > Cisco Systems
>             >>>> >
>             >>>> >
>             >>>> >
>             _______________________________________________
>             >>>> > devel mailing list
>             >>>> > devel@open-mpi.org
>             >>>> >
>             http://www.open-mpi.org/mailman/listinfo.cgi/devel
>             >>>>
>             >>>>
>             >>>>
>             _______________________________________________
>             >>>> devel mailing list
>             >>>> devel@open-mpi.org
>             >>>>
>             http://www.open-mpi.org/mailman/listinfo.cgi/devel
>             >>>>
>             >>>
>             >>>
>             _______________________________________________
>             >>> devel mailing list
>             >>> devel@open-mpi.org
>             >>>
>             http://www.open-mpi.org/mailman/listinfo.cgi/devel
>             >>
>             _______________________________________________
>             >> devel mailing list
>             >> devel@open-mpi.org
>             >>
>             http://www.open-mpi.org/mailman/listinfo.cgi/devel
>             >
>             >
>             > --
>             > Jeff Squyres
>             > Cisco Systems
>             >
>             >
>             > _______________________________________________
>             > devel mailing list
>             > devel@open-mpi.org
>             >
>             http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>             _______________________________________________
>             devel mailing list
>             devel@open-mpi.org
>             http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
>

_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel