Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [RFC] Non-blocking collectives (LibNBC) merge to trunk
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-02-07 11:34:43


All these comments are good. I confess that although I should have, I
really did not previously consider the complexity of adding in N
contrib packages to OMPI.

The goal of the contrib packages is to easily allow additional
functionality that is nicely integrated with Open MPI. An obvious way
to do this is to include the code in the Open MPI tarball, but that
leads to the logistics and other issues that have been identified.

Ralph proposes a good way around this. But what about going farther
than that: what we if we offer a standardized set of hooks for
including contrib functionality *after* core OMPI has been installed?
Yes, it's one more step after OMPI has been installed -- but if we can
keep it as *one* step, perhaps the user onus is not that bad. Let me
explain.

Consider a new standalone executable: ompi_contrib. You would run
ompi_contrib to install and uninstall contrib functionality into your
existing OMPI:

     ompi_contrib --install http://www.example.com/nbc/nbc-ompi-contrib.tar.gz
or ompi_contrib --install file:///home/htor/nbc-ompi-contrib.tar.gz

This will download NBC (if http), build it, and install it into the
current OMPI. It is likely that the nbc-ompi-contrib.tar.gz file will
contain the real NBC tarball (or maybe just a reference to it?) plus a
small number of hook/glue scripts for OMPI integration (perhaps quite
similar to what is in the contrib/ tree [on the branch] today for
NBC?). Likewise, after NBC is installed into the local OMPI
installation, ompi_info should be able to show "nbc" as installed
contrib functionality. It then follows that we might be able to do:

     ompi_contrib --uninstall nbc

to uninstall contrib NBC from the local OMPI installation.

This kind of approach would seem to have several benefits:

- Keep a clear[er] distinction between core OMPI and contributed
packages.

- Allow simple integration of MPI libraries, tools, and even
applications (!) (think: numerical libraries, boost C++ libraries,
etc. -- how many of your users install additional tools on top of MPI
incorrectly?). Anything

- Allow 3rd parties to have "contrib" code to Open MPI without needing
to get into our code tree (and sign the 3rd party agreements, etc.),
keeping our distribution size down, avoiding release schedule
logistical issues, keeping our "core" build time down, etc.

- Allow integration of contrib functionality at both a per-user and
system-wide basis.

What I'm really proposing here is that OMPI becomes a system that can
have additional functionality installed / uninstalled. Based on the
infrastructure that we already have, this is not as much of a stretch
as one would think.

Comments?

("who's going to write this" is a question that will also have to be
answered, but perhaps we can discuss the code concept/idea first...)

On Feb 7, 2008, at 10:11 AM, Ralph H Castain wrote:

> I believe Brian and Terry raise good points. May I offer a possible
> alternative? What if we only include in Open MPI an include file that
> contains the "hooks" to libNBC, and have the build system only "see"
> those
> if someone specifies --with-NBC (or whatever option name you like).
> If you
> like, you can make the inclusion automatic if libNBC is detected on
> the
> system. It would make sense to also add -libNBC to the mpicc et al
> wrappers
> as well when the build system includes the function definitions.
>
> This would allow those users that want (or can) to use that library
> link
> against it, without adding a bunch of source code to our release. I
> suspect
> there are complications that will have to be dealt with, but offer
> it as
> something to consider.
>
>
> Also, remember that there is an added burden when we add source code
> to Open
> MPI that we haven't discussed - we are now adding coordination
> issues to our
> own release cycle. If libNBC changes, are we now going to be pressed
> to
> issue another OMPI release so that the new NBC version is included?
> Do we
> now need to coordinate our releases with theirs so that things align?
>
> And if we have an increasing number of such "included" packages, how
> complex
> is -that- release discussion going to get?!?
>
>
> On 2/7/08 4:48 AM, "Terry Dontje" <Terry.Dontje_at_[hidden]> wrote:
>
>> Torsten Hoefler wrote:
>>> Hi Brian,
>>>
>>>> Let me start by reminding everyone that I have no vote, so this
>>>> should
>>>> probably be sent to /dev/null.
>>>>
>>> thanks for your comment and this will not go to /dev/null!
>>>
>>>
>>>> I think Ralph raised some good points. I'd like to raise another.
>>>>
>>> yes [will reply to this in a separate thread]
>>>
>>>
>>>> Does it make sense to bring LibNBC into the release at this point,
>>>> given the current standardization process of non-blocking
>>>> collectives?
>>>>
>>>> My feeling is no, based on the long term support costs. We had
>>>> this
>>>> problem with a function in LAM/MPI -- MPIL_SPAWN, I believe it
>>>> was --
>>>> that was almost but not quite MPI_COMM_SPAWN. It was added to
>>>> allow
>>>> spawn before the standard was finished for dynamics. The problem
>>>> is,
>>>> it wasn't quite MPI_COMM_SPAWN, so we were now stuck with yet
>>>> another
>>>> function to support (in a touchy piece of code) for infinity and
>>>> beyond.
>>>>
>>>> I worry that we'll have the same with LibNBC -- a piece of code
>>>> that
>>>> solves an immediate problem (no non-blocking collectives in MPI)
>>>> but
>>>> will become a long-term support anchor. Since this is something
>>>> we'll
>>>> be encouraging users to write code to, it's not like support for
>>>> mvapi, where we can just deprecate it and users won't really
>>>> notice.
>>>> It's one thing to tell them to update their cluster software
>>>> stack --
>>>> it's another to tell them to rewrite their applications.
>>>>
>>> I think this is a very good and valid point. However, I would like
>>> to
>>> deprecate the NBC_* things as soon as non-blocking collectives are a
>>> part of the standard. Of course, this would probably need two minor
>>> versions to "clean" the code-base, but this is (will be) our normal
>>> procedure (just what happened to MVAPI).
>>>
>>>
>> Though it doesn't seem to me that NBC is a slam dunk to get into
>> the MPI
>> spec and I could
>> imagine it changing significantly due to someone elses opinion/needs.
>>> And rewriting the user's application will not be that hard, it'll
>>> mainly
>>> be vim:%s/NBC_/MPI_/g. Even if we change the interface (e.g. add
>>> tags or
>>> decide to use the more limited split collective approach), this
>>> task is
>>> rather easy and can be automated easily. It's not a functionality
>>> change, just an interface.
>>>
>>>
>> Though if NBC is built by default for release builds I think that
>> raises
>> the bar saying that we
>> OMPI believe this should be used by all of our users without any
>> concerns that the API may
>> change or it might have significant issues.
>>
>> On a similar track do you have any tests that validate the
>> functionality/correctness of NBC
>> that can be ran as a part of the MTT nightly tests?
>>
>> My opinion is I have no problem with NBC being merged in just that I
>> don't think it should be
>> built by default.
>>
>> --td
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems