Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] BTL move - the notion
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-12-05 08:43:59

I'll answer this outside of Terry's reply so we can stay under
George's page limit. :-))

I don't have any philosophical opposition to the idea. Indeed, there
are places where I would potentially have some use for the btl's,
perhaps as an alternative comm channel in the OOB. I will point out,
though, that there are several things we thought when we started this
project that have proven unworkable over time. For example, the idea
that the RTE could be a general purpose one without impacting OMPI
proved incorrect and has been abandoned. It may well be that the
notion of using the BTL's for non-OMPI projects will fall into that
category as well - not saying it does, but I think it is still TBD.

That said, I do have some significant concerns about -how- this is
done that fall into two categories:

1. Procedural
Keeping the common code in the OMPI repository can raise quite a bit
of trouble with synchronizing release cycles. We are just about to
exit a period of requested "quiet" time on the trunk to stabilize it
for the 1.3 release. If STCI is in an active development phase, this
could have caused a major problem as we would have demanded they not
commit to our code repository. It is easy to foresee the reverse
situation. Indeed, from working on several other similar projects,
this problem is not only common, but frequent. How do we intend to
work this out?

I am also concerned about slowing down OMPI's development efforts due
to the need to coordinate proposed changes with an even broader
community, and one that will have conflicting requirements/schedules.
We already have problems getting people to stay adequately involved as
changes are proposed and made, especially as the communities members
have become involved in other efforts over time. It would become
unworkable if we take months to touch base with everyone who might be
impacted and get general consensus on changes required by OMPI. As
Terry said, we have to maintain OMPI's agility.

We all need to keep something in mind here. While this discussion is
about the BTL's and coordinating with STCI, we are talking about a
general method of operation that will have to be extended to anyone
with a similar request. There already are other groups out there, some
competing with STCI, that have issued similar requests for sharing
various pieces of the code base (the ones coming to me mostly pertain
to the RTE). So whatever we do should be generalizable - it can't just
be a point solution for STCI.

I am disturbed by the immediate rejection of methods developed and
used by other large code projects that address this very problem. Both
Hg and GIT were developed specifically with this code sharing
synchronization issue in mind, and have enjoyed rapid adoption and get
rave reviews for their solutions. It provides maximum flexibility, but
requires a bit of a learning curve and admittedly more attention to
maintenance details. However, other projects in similar circumstances
have found it highly beneficial. I would think we should at least
consider what is becoming the state-of-the-art method for code sharing
before simply rejecting this approach as too much maintenance.

2. Technical
I think we all agree that STCI and OMPI have different objectives and
requirements. OMPI is facing the need to launch and operate at extreme
scales by next summer, has received a lot of interest in having it
report errors into various systems, etc. We don't have all the answers
as to what will be necessary to meet these requirements, but
indications so far are that tighter integration, not deeper
abstraction, between the various layers will be needed. By that, I
don't mean we will violate abstraction layers, but rather that the
various layers need to work more as a tightly tuned instrument, with
each layer operating based on a clear knowledge of how the other
layers are functioning.

For example, for modex-less operations, the MPI/BTLs have to know that
the RTE/OS will be providing certain information. This means that they
don't have to go out and discover it themselves every time. Yes, we
will leave that as the default behavior so that small and/or unmanaged
clusters can operate, but we have to also introduce logic that can
detect when we are utilizing this alternative capability and exploit
it. While we are trying our best to avoid introducing RTE-like calls
into the code, the fact is that we may well have to do so (we have
already identified one btl that will definitely need to). It is simply
too early to make the decision to cut that off now - we don't know
what the long-term impacts of such a decision will be.

Finally, although I don't do much on the MPI layer, I am concerned
about performance. I would tend to oppose any additional abstraction
until we can measure the performance impact. Thus, I would like to see
the BTL move done on a tmp branch (technology to branch up to the
implementer - I don't care) so we can verify that it isn't hurting us
in some unforeseeable manner.

So I guess my concerns really boil down to dealing with conflicting
schedules and requirements, how to support multiple possibly competing
groups that want to share one or more parts of our code base, and
retaining an OMPI-first philosophy when it comes to what changes get
made. My proposed solution is:

1. shift our repository to a technical solution that supports broader
code sharing

2. have the non-OMPI groups access our code base via that technology.
They can "pull" changes at will, subject to the licensing agreement.
It is true that they may have to do some local editing if the change
hits a spot where they have local mods to support their system, but
both Hg and GIT are very good at handling this - much better than svn
ever has been.

3. if there are minor mods required to make the BTL code area easier
to share via the above methods, then we should explore and implement
them. Certainly, renaming #define values would seem a no-brainer. I
suspect there are other similar things that could be done. Removing
orte/opal dependencies is more controversial and would need to
thoroughly be examined.

4. OMPI decides what changes get made to its code base. We are polite
about it and talk to the other groups to try and minimize impact, but
ultimately we do what is best for OMPI, and send out notifications
(perhaps a new mailing list specifically for that purpose) when
changes occur. Note that this would have helped the Eclipse group
enormously as otherwise they drown in the devel list trying to spot
the changes.

My $0.0002 - hope it helps

On Dec 4, 2008, at 6:00 PM, Richard Graham wrote:

> Let me start the e-mail conversation, and see how far we get.
> Goal: The goal several of us have is to be able to use the btl’s
> outside of the MPI layer in Open MPI. The layer itself is generic,
> w/o specific knowledge of Upper Level Protocols, so is well suited
> for this sort of use.
> Technical Approach: What we have suggested is to start the process
> with the Open MPI code base, and make it independent of the mpi-
> layer (which it is now), and the run-time layer.
> Before we get into any specific technical details,
> the first question I have is are people totally opposed to the
> notion of making the btl’s independent of MPI and the run-time ?
> This does not mean that it can’t be used by it, but that there are
> well defined abstraction layers, i.e., are people against the goal
> in the first place ?
> What are alternative suggestions to the technical approach ?
> One suggestion has been to branch and patch. To me this is a long-
> term maintenance nightmare.
> What are peoples thoughts here ?
> Rich
> _______________________________________________
> devel mailing list
> devel_at_[hidden]