Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: move BTLs out of ompi into separate layer
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-03-14 09:00:32

Brian --

Thanks for such a detailed answer! This helps clarify many things.

On Mar 11, 2009, at 1:31 PM, Brian W. Barrett wrote:

> On Wed, 11 Mar 2009, Richard Graham wrote:
> > Brian,
> > Going back over the e-mail trail it seems like you have raised two
> > concerns:
> > - BTL performance after the change, which I would take to be
> > - btl latency
> > - btl bandwidth
> > - Code maintainability
> > - repeated code changes that impact a large number of files
> > - A demonstration that the changes actually achieve their goal. As
> we
> > discussed after you got off the call, there are two separate goals
> here
> > - being able to use the btl?s outside the context of mpi, but
> > within the ompi code base
> > - ability to use the btl?s in the context of a run-time other than
> > orte
> > Another concern I have heard raised by others is
> > - mpi startup time
> >
> > Has anything else been missed here ? I would like to make sure
> that we
> > address all the issues raised in the next version of the RFC.
> I think the umbrella concerns for the final success of the change
> are btl
> performance (in particular, latency and message rates for cache-
> unfriendly
> applications/benchmarks) and code maintainability. In addition,
> there are
> some intermediate change issues I have, in that this project is
> working
> different than other large changes. In particular, there is/was the
> appearance of being asked to accept changes which only make sense if
> the
> btl move is going to move forward, without any way to judge the
> performance or code impact because critical technical issues still
> remain.
> The latency/message rate issues are fairly straight forward from an
> end
> measure point-of-view. My concerns on latency/message rate come not
> from
> the movement of the BTL to another library (for most operating
> systems /
> shared library systems that should be negligible), but from the code
> changes which surround moving the BTLs. The BTLs are tightly
> intertwined
> with a number of pieces of the OMPI layer, in particular the BML and
> MPool
> frameworks and the ompi proc structure. I had a productive
> conversation
> with Rainer this morning explaining why I'm so concerned about the
> bml and
> ompi proc structures. The ompi proc structure currently acts not
> only as
> the identifier for a remote endpoint, but stores endpoint specific
> data
> for both the PML and BML. The BML structure actually contains each
> BTL's
> per process endpoint information, in the form of the base_endpoint_t*
> structures returned from add_procs(). Moving these structures
> around must
> be done with care, as some of the proposals Jeff, Rainer, and I came
> up
> with this morning either induced spaghetti code or greatly increased
> the
> spread of information needed for the critical send path through the
> memory
> space (thereby likely increasing cache misses on send for real
> applications).
> The code maintainability issue comes from three separate and
> independent
> issues. First, there is the issue of how the pieces of the OMPI layer
> will interact after the move. The BML/BTL/MPool/Rcache dance is
> already
> complicated, and care should be taken to minimize that change.
> Start-up
> is also already quite complex, and moving the BTLs to make them
> independent of starting other pieces of Open MPI can be done well or
> can
> be done poorly. We need to ensure it's done well, obviously. Second,
> there is the issue of wire-up. My impression from conversations with
> everyone at ORNL was that this move of BTLs would include changes to
> allow
> BTLs to wire-up without the RML. I understand that Rich said this
> was not
> the case during the part of the admin meeting I missed yesterday, so
> that may no longer be a concern. Finally, there has been some
> discussion,
> mainly second hand in my case, about the mechanisms in which the trunk
> would be modified to allow for using OMPI without ORTE. I have
> concerns
> that we'd add complexity to the BTLs to achieve that, and again that
> can
> be done poorly if we're not careful. Talking with Jeff and Rainer
> this
> morning helped reduce my concern in this area, but I think it also
> added
> to the technical issues with must be solved to consider this project
> ready
> for movement to the trunk.
> There are a couple of technical issues which I believe prevent a
> reasonable discussion of the performance and maintainability issues
> based
> on the current branch. I talked about some of them in the previous
> two
> paragraphs, but so that we have a short bullet list, they are:
> - How will the ompi_proc_t be handled? In particular,
> where will PML/BML data be stored, and how will we
> avoid adding new cache misses.
> - How will the BML and MPool be handled? The BML holds
> the BTL endpoint data, so changes have to be made if
> it continues to live in OMPI.
> - How will the modex and the intricate dance with adding
> new procs from dynamic processes be handled?
> - How will we handle the progress mechanisms in cases where
> the MTLs are used and the BTLs aren't needed by the RTE?
> - If there are users outside of OMPI, but who want to also use
> OMPI, how will the library versioning / conflict problem be
> solved?
> > As was mentioned before, our time frame for this is measured in
> weeks,
> > and not in months. I believe the date of May 1st was mentioned to
> > coincide with the next feature release.
> While I understand your deadline, we have in the past been very
> conservative with such large changes. The C/R work was delayed for
> over a
> year because people were concerned with the impact to performance and
> maintainability. ORTE work is consistently delayed in the name of
> code
> stability. I believe that changing our desire for high quality code
> in
> the trunk because of an organization's deadline (particularly when
> other
> organizations are successfully using branches to meet their deadlines)
> sets a poor precedent and goes against previous precedents.
> Similarly, my concern with the intermediate changes which have been
> proposed or occurred come from the slippery-slope argument. Changes
> which
> are really only necessary for the btl move (even general code
> cleanups)
> should only occur once we're all sure the btl move will work.
> Otherwise,
> we're impacting other developers (many of who are working on temp
> branches
> attempting to get a feature to completion, as our normal process
> dictates) in order to reach an end point which may not be
> achievable. In
> talking to Rainer this morning with Jeff, I think we came up with a
> number
> of ideas on how to mitigate this impact and find a better balance
> which
> allows ORNL to answer the critical technical questions (which are
> not just
> mine, but are shared by others and are critical to the "make it
> work" part
> of the process) and allows the rest of the community some belief
> that we
> can avoid any permanent harm if the move doesn't work out.
> > One thing that should help when the naming changes are applied is
> that
> > this is scripted, and the script can be made available for others
> that
> > are working on temp branches ? which includes us, also.
> That unfortunately doesn't help other developers, if they're trying to
> strictly follow the version control changes to the trunk. The
> problem is
> that we're going to get all those moves (hopefully the script now
> svn moves
> instead of svn rm / svn add) through the version control system. The
> script would then cause all the changes to occur a second time, and
> that
> could be very problematic. The problem with the version control
> changes
> filtering down is that it is not all-encompassing. For example, svn
> will
> have problems if the btl directory moves but I have my own private
> special
> BTL. Yes, i might be able to use your scripts to handle that, but
> if they
> aren't written with that scenario in mind, they won't help. It also
> won't
> help if I've added a particular file to an existing BTL and the BTL
> then
> moves.
> I think these cases are worth the pain to non-ORNL developers *IF*
> all the
> other issues are addressed. Otherwise, we're unfairly asking them
> to deal
> with a radically changing code base for an incomplete project, a
> situation
> we've worked to avoid in the past.
> Hopefully this explains my thoughts on the btl move. I'm not
> opposed to
> the move itself (although I reserve the right to become opposed,
> based on
> performance and maintainability issues). I have a problem with the
> change
> in process from previous large, invasive changes.
> Hope this helps,
> Brian
> <ATT4444789.txt>

Jeff Squyres
Cisco Systems