Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-07-12 17:46:41

Thanks for the summary Ralph.

On Jul 12, 2007, at 5:04 PM, Ralph H Castain wrote:

> Yo all
> As we are discussing functional requirements for the upcoming 1.3
> release, I
> was asked to provide a little info about what is going to be
> happening to
> the ORTE part of the code base over the remainder of this year.
> Short answer: there will be a major code revision to reduce ORTE to
> the
> minimum required to support Open MPI. This includes (a) a major design
> change away from event-driven programming that will result in the
> consolidation of several frameworks and removal of at least two
> others; and
> (b) general cleanup to reduce memory footprint, startup message
> size, and
> other areas.
> Longer explanation:
> At the beginning of the Open MPI project, it was quickly determined
> that
> nobody (myself perhaps excepted) really wanted to build/maintain
> the RTE
> underpinning Open MPI. We were, after all, primarily interested in
> MPI.
> Hence, we thought it would be a good thing if we could define an
> RTE that
> would be of adequate general interest to attract partners whose
> primary
> focus would be extension and support of the RTE itself.
> Well, after several years, it is clear that the original idea isn't
> going to
> work (for a variety of reasons that aren't worth recounting here). We
> therefore decided recently that it is time to accept the
> inevitable, quit
> trying to support a more general RTE, and instead spend some effort
> reducing
> the ORTE layer down to its most basic requirements. In particular,
> we want
> to make the code easier to maintain and debug, faster and more
> scalable for
> startup, and less vulnerable to race conditions.
> In its essence, the plan consists of the following:
> 1. remove the cellid from the process name as the code will solely
> be a
> single-cluster system. Other interested parties have offered to
> provide an
> overlayer that will cross-connect Open MPI instances across
> clusters - we
> will work with them to help facilitate the necessary hooks, but won't
> duplicate that connectivity internally.
> 2. remove the RDS framework. All discovery and allocation will be
> done in a
> single step in the RAS. We will revise the RAS to allow better co-
> existence
> of resource manager specified allocations and hostfiles (more on that
> later).
> 3. Eliminate the GPR framework, or at the very least, removal of the
> subscribe/trigger functionality from it. We will be moving away
> from the
> current event-driven architecture to reduce our exposure to race
> conditions
> and eliminate the complexity caused by recursive callbacks due to
> trigger
> events. We will explore globalized data storage in simplified
> arrays as an
> alternative to the GPR database - initial tests support the idea, but
> further work needs to be done. We know that people like the Eclipse
> PTP team
> need access to certain data - we will work with them to figure out
> the best
> way to do so given the changes to/departure of the GPR.
> 4. Consolidate the NS, PLS, RMGR, and SMR framework functionality
> into a
> single process lifecycle management (PLM) framework. PLM components
> will
> still call the ERRMGR to deal with response to process failures,
> and will
> assume responsibility for storing their own data. The SCHEMA
> framework will
> be eliminated as part of this change. We will move some functions
> (e.g.,
> orte_abort) that are currently in the runtime and util areas into
> the PLM
> components as appropriate.
> 5. Each framework will have logic in their respective "open"
> function that
> specifically prevents them from performing component_open unless we
> are on
> the HNP. If we are not on the HNP, an #if ORTE_WANT_NO_SUPPORT will
> force
> the use of a "no_op" module that does nothing, but whose return
> codes will
> indicate that an error did not occur. If that is not set, then a proxy
> module will be utilized that provides appropriate communications to
> the HNP
> to support remote applications. This will reduce memory footprints
> (since no
> components will be opened) and allow us to simply pass-through MCA
> params to
> all processes while ensuring proper functionality is available.
> Note that
> environments like CNOS may still require special components in some
> of the
> frameworks as the "no_op" may not be suitable for all API functions.
> 6. the SDS framework will not only support name discovery, but will
> hold all
> backend operations required during startup. For example, the
> contents of the
> message now sent back to the new PLM by each process will be
> dependent upon
> environment. Hence, a one-to-one correspondence will be established
> between
> PLM and SDS components.
> 7. consolidate the data in the MPI startup message (currently
> delivered at
> STG1 stagegate). For example, any data in the MPI startup message
> that needs
> to be indexed will be sent in an array sorted by vpid (no need to
> send the
> entire list of process name structs). Whereas before we couldn't take
> advantage of our knowledge of the message contents since it was
> generated by
> the GPR (which by design had no insight into the data), we will now
> exploit
> our knowledge to ensure the message is only that required by the
> specific
> environment. We will look at, for example, the direct one-to-one
> correspondence of PLM to SDS to see how this can best be implemented.
> Other things (e.g., routing of RML messages) are either already under
> development or under discussion - we will provide more info on
> these as they
> move along.
> As always, any thoughts/suggestions are welcomed.
> Ralph
> _______________________________________________
> devel-core mailing list
> devel-core_at_[hidden]

Jeff Squyres
Cisco Systems