Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph Castain (rhc_at_[hidden])
Date: 2007-07-17 22:21:38

Yo all

This was discussed at some length in the weekly core developers telecon -
with the discussion continued in a dedicated follow-on telecon later this
afternoon. The consensus of those participating in the dedicated telecon was
that this plan should be followed, and that the proposed cellid commit in
step 1 go forward.

We also agreed that work done on these steps would be done in tmp branches
so the results could be evaluated and tested prior to commit into the OMPI
trunk. I would like to emphasize that this (IMHO) is a practice that should
be employed for any non-trivial change.

As we step through the proposed changes, we will inform the group of our
progress and details of planned implementations as they become clearer. As I
hopefully made clear during the follow-on telecon, we don't have detailed
answers to how each of these will be done, particularly as we look deeper
into the progression. However, the first few steps are pretty clear and,
once implemented, should yield insight into the following steps.

FYI: I sent a note to the devel and core-devel lists pointing to the Trac
ticket describing the planned hostfile changes to be implemented in step 2.
Please provide comments on that work as desired.

Our expected timetable remains to complete these proposed changes by the
Oct-Nov timeframe, barring any unforeseen obstacles.

Thanks again for everyone's comments and participation. Please feel free to
request updates/info, and/or to make further suggestions.


------ Forwarded Message
From: Brian Barrett <bbarrett_at_[hidden]>
Date: Tue, 17 Jul 2007 14:06:44 -0600
To: Ralph Castian <rhc_at_[hidden]>
Subject: Fwd: [OMPI devel] Major reduction in ORTE

Begin forwarded message:

> From: Ralph H Castain <rhc_at_[hidden]>
> Date: July 12, 2007 3:04:01 PM MDT
> To: Open MPI Core Developers <devel-core_at_[hidden]>, Open MPI
> Developers <devel_at_[hidden]>
> Subject: [OMPI devel] Major reduction in ORTE
> Reply-To: Open MPI Developers <devel_at_[hidden]>
> Yo all
> As we are discussing functional requirements for the upcoming 1.3
> release, I
> was asked to provide a little info about what is going to be
> happening to
> the ORTE part of the code base over the remainder of this year.
> Short answer: there will be a major code revision to reduce ORTE to
> the
> minimum required to support Open MPI. This includes (a) a major design
> change away from event-driven programming that will result in the
> consolidation of several frameworks and removal of at least two
> others; and
> (b) general cleanup to reduce memory footprint, startup message
> size, and
> other areas.
> Longer explanation:
> At the beginning of the Open MPI project, it was quickly determined
> that
> nobody (myself perhaps excepted) really wanted to build/maintain
> the RTE
> underpinning Open MPI. We were, after all, primarily interested in
> MPI.
> Hence, we thought it would be a good thing if we could define an
> RTE that
> would be of adequate general interest to attract partners whose
> primary
> focus would be extension and support of the RTE itself.
> Well, after several years, it is clear that the original idea isn't
> going to
> work (for a variety of reasons that aren't worth recounting here). We
> therefore decided recently that it is time to accept the
> inevitable, quit
> trying to support a more general RTE, and instead spend some effort
> reducing
> the ORTE layer down to its most basic requirements. In particular,
> we want
> to make the code easier to maintain and debug, faster and more
> scalable for
> startup, and less vulnerable to race conditions.
> In its essence, the plan consists of the following:
> 1. remove the cellid from the process name as the code will solely
> be a
> single-cluster system. Other interested parties have offered to
> provide an
> overlayer that will cross-connect Open MPI instances across
> clusters - we
> will work with them to help facilitate the necessary hooks, but won't
> duplicate that connectivity internally.
> 2. remove the RDS framework. All discovery and allocation will be
> done in a
> single step in the RAS. We will revise the RAS to allow better co-
> existence
> of resource manager specified allocations and hostfiles (more on that
> later).
> 3. Eliminate the GPR framework, or at the very least, removal of the
> subscribe/trigger functionality from it. We will be moving away
> from the
> current event-driven architecture to reduce our exposure to race
> conditions
> and eliminate the complexity caused by recursive callbacks due to
> trigger
> events. We will explore globalized data storage in simplified
> arrays as an
> alternative to the GPR database - initial tests support the idea, but
> further work needs to be done. We know that people like the Eclipse
> PTP team
> need access to certain data - we will work with them to figure out
> the best
> way to do so given the changes to/departure of the GPR.
> 4. Consolidate the NS, PLS, RMGR, and SMR framework functionality
> into a
> single process lifecycle management (PLM) framework. PLM components
> will
> still call the ERRMGR to deal with response to process failures,
> and will
> assume responsibility for storing their own data. The SCHEMA
> framework will
> be eliminated as part of this change. We will move some functions
> (e.g.,
> orte_abort) that are currently in the runtime and util areas into
> the PLM
> components as appropriate.
> 5. Each framework will have logic in their respective "open"
> function that
> specifically prevents them from performing component_open unless we
> are on
> the HNP. If we are not on the HNP, an #if ORTE_WANT_NO_SUPPORT will
> force
> the use of a "no_op" module that does nothing, but whose return
> codes will
> indicate that an error did not occur. If that is not set, then a proxy
> module will be utilized that provides appropriate communications to
> the HNP
> to support remote applications. This will reduce memory footprints
> (since no
> components will be opened) and allow us to simply pass-through MCA
> params to
> all processes while ensuring proper functionality is available.
> Note that
> environments like CNOS may still require special components in some
> of the
> frameworks as the "no_op" may not be suitable for all API functions.
> 6. the SDS framework will not only support name discovery, but will
> hold all
> backend operations required during startup. For example, the
> contents of the
> message now sent back to the new PLM by each process will be
> dependent upon
> environment. Hence, a one-to-one correspondence will be established
> between
> PLM and SDS components.
> 7. consolidate the data in the MPI startup message (currently
> delivered at
> STG1 stagegate). For example, any data in the MPI startup message
> that needs
> to be indexed will be sent in an array sorted by vpid (no need to
> send the
> entire list of process name structs). Whereas before we couldn't take
> advantage of our knowledge of the message contents since it was
> generated by
> the GPR (which by design had no insight into the data), we will now
> exploit
> our knowledge to ensure the message is only that required by the
> specific
> environment. We will look at, for example, the direct one-to-one
> correspondence of PLM to SDS to see how this can best be implemented.
> Other things (e.g., routing of RML messages) are either already under
> development or under discussion - we will provide more info on
> these as they
> move along.
> As always, any thoughts/suggestions are welcomed.
> Ralph
> _______________________________________________
> devel mailing list
> devel_at_[hidden]

------ End of Forwarded Message