Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Preparations for moving the btl's
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-12-04 12:44:38


Hmmm...only problem with that idea is that the entity being
communicated to (e.g., SLURM, Moab) have no concept of MPI nor any way
to communicate via that system. They do, however, have APIs that
notifier can call, and know how to speak TCP via their own agreed-upon
protocols. And many large systems turn off the TCP btl (all of ours,
for example) because it isn't needed and opens additional unnecessary
ports.

So calling APIs and/or sending messages across the OOB are pretty
straight forward. Teaching Moab to understand btl/datatype engine
messages (flowing across who knows what transport) is an unlikely
thing to happen.

Besides, one of the primary reasons for needing to call notifier is a
failure in the btl - so relying on the btl to send the message is self-
defeating.

On Dec 4, 2008, at 10:37 AM, Richard Graham wrote:

> Here is where I think we should reconsider accessing the notifier
> component in the btl. It creates dependencies in the btl that are
> not needed. The idea of a notifier component is a good one, but I
> would defer using it to upper layers, rather than embedding it in
> the guts of the communication system. I would be in favor of an
> approach that sends the information up the call stack. The btl’s
> should not depend on other communication primitives, as they are the
> communication primitive.
>
> Rich
>
>
> On 12/4/08 9:04 AM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>
>> Yes, FTB utilizes the notifier framework. In addition, we have three
>> other components getting ready to be added to that framework that
>> will
>> provide interfaces to Moab, SLURM, and a DOE monitoring program. The
>> first two will require messaging capabilities to tell the schedulers
>> about problem nodes/routes. The latter will also use a messaging
>> protocol, but is mostly aimed at alerting operators to a problem and
>> creating a historical archive.
>>
>> That said, we can expect the use of orte_notifier to spread across
>> the BTL's pretty aggressively in the next few months, and for the
>> notifier API to change/expand as we address these needs.
>>
>> On Dec 4, 2008, at 6:13 AM, Jeff Squyres wrote:
>>
>> > I think you got it right. And I think we're pretty good in terms
>> of
>> > BTL usage of ORTE and OPAL (to include the new "notifier" service
>> > that Ralph put in recently -- what the FTB will likely eventually
>> > use, I think...?); those interfaces and abstraction barriers are
>> > technologically enforced. If you break the abstractions, the
>> linker
>> > will swiftly and unmercifully punish you. (this was exactly [one
>> > of] the rationale that we used for splitting the code base into
>> > OPAL, ORTE, and OMPI several years ago)
>> >
>> > Greg has already noted on the wiki a few constants used in the
>> BTL's
>> > that have an OMPI_ prefix that aren't really OMPI values (e.g.,
>> > OMPI_ENABLE_HETEROGENEOUS_SUPPORT). These come from configure
>> > (i.e., opal/include/opal_config.h) and were not renamed back when
>> we
>> > split the code base into OPAL, ORTE, and OMPI. I don't think we
>> had
>> > a strong reason for not renaming them -- most could probably be
>> > renamed to OPAL_* -- we just didn't do it then. Perhaps they can
>> be
>> > changed during the BTL extraction process (I noted this on the
>> wiki).
>> >
>> >
>> >
>> > On Dec 3, 2008, at 9:43 PM, Richard Graham wrote:
>> >
>> >> BTW,
>> >> I was guessing FTB is Fault Tolerant Backbone, but if not, can
>> >> someone tell me what it is ? If it is not the later, what I just
>> >> wrote about it makes no sense.
>> >>
>> >> Rich
>> >>
>> >>
>> >> On 12/3/08 9:34 PM, "Richard Graham" <rlgraham_at_[hidden]> wrote:
>> >>
>> >>> The goal is to use the btl’s outside of the context of MPI, which
>> >>> was what was in mind from the day the ompi work started over five
>> >>> years ago, but with no other use at the time, things grew up
>> >>> intermingled – no surprise at all. What we are attempting to do
>> >>> is to untangle the existing dependencies, and make a much cleaner
>> >>> distinction between how/what data is passed between layers.
>> >>>
>> >>> I expect this will involve some sort of well defined interface
>> >>> between the btl’s and orte, and I don’t know if this will also
>> >>> require something like this between the btl’s and the pml – I
>> >>> think that interface is rigidly enforced, but am not sure.
>> >>>
>> >>> I expect that explicit calls to FTB in the btl layer would have
>> to
>> >>> be componentized, especially in the context of what is developing
>> >>> in the FT working group of the MPI Forum. Not that FTB is bad in
>> >>> any way, just that it is one of many monitors.
>> >>>
>> >>> We will need to talk about this on a case by case basis, and
>> >>> decide how to proceed. If anyone wants to help, please do.
>> >>>
>> >>> Rich
>> >>>
>> >>>
>> >>> On 12/3/08 3:02 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>> >>>
>> >>>> I managed to execute the modex-less changes pretty much without
>> >>>> introducing additional ORTE dependencies into the BTL's, though
>> >>>> there
>> >>>> may be some additions as we look a the other BTLs that I didn't
>> >>>> address. So hopefully that won't contribute too much to the
>> issue
>> >>>> here.
>> >>>>
>> >>>> At the moment, I don't think it matters where notifier sits - it
>> >>>> might
>> >>>> be able to move to OPAL. Only catch will be if some notifier
>> >>>> component
>> >>>> requires communications. I'm thinking of FTB, for example, and
>> >>>> our own
>> >>>> local monitoring program that may require TCP messaging. We
>> don't
>> >>>> currently have anything in OPAL that would support an OPAL level
>> >>>> messaging system, though perhaps that could be resolved.
>> >>>>
>> >>>> We also have dependencies where the BTL's will call orte_ess to
>> >>>> find
>> >>>> out what node another proc is on, the node local rank of that
>> proc,
>> >>>> etc. Those dependencies are likely to grow after the Dec meeting
>> >>>> (see
>> >>>> wiki for that agenda item), and definitely cannot be moved to
>> OPAL.
>> >>>>
>> >>>> However, note that Rich stated the BTL's were -not- moving to
>> OPAL.
>> >>>> This begs the question: where -are- they going? Into their own
>> >>>> layer?
>> >>>> Will that layer be somewhere in-between OMPI and ORTE (in which
>> >>>> case,
>> >>>> the ORTE dependencies are moot)?
>> >>>>
>> >>>> I note that the wiki page doesn't address any of these
>> questions,
>> >>>> which is understandable if things are just getting underway.
>> But it
>> >>>> does sound like this is going to take some thought to ensure we
>> >>>> don't
>> >>>> paint ourselves into a corner.
>> >>>>
>> >>>> Ralph
>> >>>>
>> >>>>
>> >>>> On Dec 3, 2008, at 12:10 PM, Jeff Squyres wrote:
>> >>>>
>> >>>> > FWIW, I see lots of notifier calls being added to the BTLs
>> (and
>> >>>> > elsewhere throughout the OMPI code base) over time...
>> >>>> >
>> >>>> > On Dec 3, 2008, at 2:07 PM, Tim Mattox wrote:
>> >>>> >
>> >>>> >> The BTLs might have added calls to the notifier framework in
>> >>>> their
>> >>>> >> error paths.
>> >>>> >> The notifier framework is currently in the ORTE layer... not
>> >>>> sure
>> >>>> >> if we could
>> >>>> >> move it down to OPAL. Ralph, any thoughts on that?
>> >>>> >>
>> >>>> >> On Wed, Dec 3, 2008 at 11:56 AM, Richard Graham <rlgraham_at_[hidden]
>> >>>> >
>> >>>> >> wrote:
>> >>>> >>> George told me about what he is doing, so no changes would
>> be
>> >>>> >>> committed
>> >>>> >>> until George has his changes in.
>> >>>> >>>
>> >>>> >>> Are there other changes to the btl's that we should be aware
>> >>>> of ?
>> >>>> >>>
>> >>>> >>> Rich
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> On 12/3/08 11:47 AM, "George Bosilca" <bosilca_at_[hidden]>
>> >>>> wrote:
>> >>>> >>>
>> >>>> >>>> Terry,
>> >>>> >>>>
>> >>>> >>>> I'm involved [at some degree] in both efforts and I can
>> >>>> confirm
>> >>>> >>>> these
>> >>>> >>>> two efforts will not affect each other in any bad way.
>> >>>> >>>>
>> >>>> >>>> george.
>> >>>> >>>>
>> >>>> >>>> On Dec 3, 2008, at 11:42 , Terry Dontje wrote:
>> >>>> >>>>
>> >>>> >>>>> I don't have any *strong* objections. However, I know that
>> >>>> Eugene
>> >>>> >>>>> and George B have been working on some Fastpath code
>> changes
>> >>>> >>>>> that we
>> >>>> >>>>> should make sure neither project obliterates the other.
>> >>>> >>>>>
>> >>>> >>>>> --td
>> >>>> >>>>>
>> >>>> >>>>> Richard Graham wrote:
>> >>>> >>>>>> Now that 1.3 will be released, we would like to go ahead
>> >>>> with the
>> >>>> >>>>>> plan to move the btl’s out of the MPI layer. Greg Koenig
>> >>>> who is
>> >>>> >>>>>> doing most of the work has started a wiki page with
>> >>>> details on
>> >>>> >>>>>> the
>> >>>> >>>>>> plans. Right now details are sketchy, as Greg is digging
>> >>>> through
>> >>>> >>>>>> the code, and has only hand written notes on data
>> >>>> structures that
>> >>>> >>>>>> need to be moved, include files that are not needed, etc.
>> >>>> The
>> >>>> >>>>>> page
>> >>>> >>>>>> is at:
>> >>>> >>>>>> _https://svn.open-mpi.org/trac/ompi/wiki/BTLExtraction_
>> >>>> >>>>>>
>> >>>> >>>>>> The first three steps basically only involve code motion,
>> >>>> moving
>> >>>> >>>>>> items such as ompi_list, and renaming them, moving where
>> >>>> the code
>> >>>> >>>>>> is actually located in the repository, and the like. For
>> >>>> these we
>> >>>> >>>>>> do not plan to put out a formal RFC, but comments are
>> very
>> >>>> >>>>>> welcome,
>> >>>> >>>>>> and any hands that are willing to help with this are even
>> >>>> more
>> >>>> >>>>>> welcome.
>> >>>> >>>>>>
>> >>>> >>>>>> The last phase where the btl’s are made dependent on
>> OPAL,
>> >>>> and
>> >>>> >>>>>> supporting libraries such as mpools I expect will be
>> >>>> disruptive,
>> >>>> >>>>>> and will definitely require an RFC, and will also be a
>> >>>> longer
>> >>>> >>>>>> process.
>> >>>> >>>>>>
>> >>>> >>>>>> Please send comments,
>> >>>> >>>>>> Rich
>> >>>> >>>>>>
>> >>>>
>> ------------------------------------------------------------------------
>> >>>> >>>>>>
>> >>>> >>>>>> _______________________________________________
>> >>>> >>>>>> devel mailing list
>> >>>> >>>>>> devel_at_[hidden]
>> >>>> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>> >>>>>>
>> >>>> >>>>>
>> >>>> >>>>> _______________________________________________
>> >>>> >>>>> devel mailing list
>> >>>> >>>>> devel_at_[hidden]
>> >>>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>> _______________________________________________
>> >>>> >>>> devel mailing list
>> >>>> >>>> devel_at_[hidden]
>> >>>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> _______________________________________________
>> >>>> >>> devel mailing list
>> >>>> >>> devel_at_[hidden]
>> >>>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>> >>>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> --
>> >>>> >> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
>> >>>> >> tmattox_at_[hidden] || timattox_at_[hidden]
>> >>>> >> I'm a bright... http://www.the-brights.net/
>> >>>> >>
>> >>>> >> _______________________________________________
>> >>>> >> devel mailing list
>> >>>> >> devel_at_[hidden]
>> >>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Jeff Squyres
>> >>>> > Cisco Systems
>> >>>> >
>> >>>> >
>> >>>> > _______________________________________________
>> >>>> > devel mailing list
>> >>>> > devel_at_[hidden]
>> >>>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> devel mailing list
>> >>>> devel_at_[hidden]
>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>>
>> >>>
>> >>> _______________________________________________
>> >>> devel mailing list
>> >>> devel_at_[hidden]
>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >> _______________________________________________
>> >> devel mailing list
>> >> devel_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >
>> >
>> > --
>> > Jeff Squyres
>> > Cisco Systems
>> >
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel