Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [RFC] Non-blocking collectives (LibNBC) merge to trunk
From: Ralph H Castain (rhc_at_[hidden])
Date: 2008-02-07 11:55:06


I think your proposed approach is an excellent one! I know it will take work
to implement, which raises its own issues, but I do believe that it is the
only real long-term solution.

Just my $0.002. I would be willing to help with implementation, if that
would be of use. Not sure I understand the build system well enough to just
do it, I fear.

On 2/7/08 9:34 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:

> All these comments are good. I confess that although I should have, I
> really did not previously consider the complexity of adding in N
> contrib packages to OMPI.
>
> The goal of the contrib packages is to easily allow additional
> functionality that is nicely integrated with Open MPI. An obvious way
> to do this is to include the code in the Open MPI tarball, but that
> leads to the logistics and other issues that have been identified.
>
> Ralph proposes a good way around this. But what about going farther
> than that: what we if we offer a standardized set of hooks for
> including contrib functionality *after* core OMPI has been installed?
> Yes, it's one more step after OMPI has been installed -- but if we can
> keep it as *one* step, perhaps the user onus is not that bad. Let me
> explain.
>
> Consider a new standalone executable: ompi_contrib. You would run
> ompi_contrib to install and uninstall contrib functionality into your
> existing OMPI:
>
> ompi_contrib --install http://www.example.com/nbc/nbc-ompi-contrib.tar.gz
> or ompi_contrib --install file:///home/htor/nbc-ompi-contrib.tar.gz
>
> This will download NBC (if http), build it, and install it into the
> current OMPI. It is likely that the nbc-ompi-contrib.tar.gz file will
> contain the real NBC tarball (or maybe just a reference to it?) plus a
> small number of hook/glue scripts for OMPI integration (perhaps quite
> similar to what is in the contrib/ tree [on the branch] today for
> NBC?). Likewise, after NBC is installed into the local OMPI
> installation, ompi_info should be able to show "nbc" as installed
> contrib functionality. It then follows that we might be able to do:
>
> ompi_contrib --uninstall nbc
>
> to uninstall contrib NBC from the local OMPI installation.
>
> This kind of approach would seem to have several benefits:
>
> - Keep a clear[er] distinction between core OMPI and contributed
> packages.
>
> - Allow simple integration of MPI libraries, tools, and even
> applications (!) (think: numerical libraries, boost C++ libraries,
> etc. -- how many of your users install additional tools on top of MPI
> incorrectly?). Anything
>
> - Allow 3rd parties to have "contrib" code to Open MPI without needing
> to get into our code tree (and sign the 3rd party agreements, etc.),
> keeping our distribution size down, avoiding release schedule
> logistical issues, keeping our "core" build time down, etc.
>
> - Allow integration of contrib functionality at both a per-user and
> system-wide basis.
>
> What I'm really proposing here is that OMPI becomes a system that can
> have additional functionality installed / uninstalled. Based on the
> infrastructure that we already have, this is not as much of a stretch
> as one would think.
>
> Comments?
>
> ("who's going to write this" is a question that will also have to be
> answered, but perhaps we can discuss the code concept/idea first...)
>
>
>
> On Feb 7, 2008, at 10:11 AM, Ralph H Castain wrote:
>
>> I believe Brian and Terry raise good points. May I offer a possible
>> alternative? What if we only include in Open MPI an include file that
>> contains the "hooks" to libNBC, and have the build system only "see"
>> those
>> if someone specifies --with-NBC (or whatever option name you like).
>> If you
>> like, you can make the inclusion automatic if libNBC is detected on
>> the
>> system. It would make sense to also add -libNBC to the mpicc et al
>> wrappers
>> as well when the build system includes the function definitions.
>>
>> This would allow those users that want (or can) to use that library
>> link
>> against it, without adding a bunch of source code to our release. I
>> suspect
>> there are complications that will have to be dealt with, but offer
>> it as
>> something to consider.
>>
>>
>> Also, remember that there is an added burden when we add source code
>> to Open
>> MPI that we haven't discussed - we are now adding coordination
>> issues to our
>> own release cycle. If libNBC changes, are we now going to be pressed
>> to
>> issue another OMPI release so that the new NBC version is included?
>> Do we
>> now need to coordinate our releases with theirs so that things align?
>>
>> And if we have an increasing number of such "included" packages, how
>> complex
>> is -that- release discussion going to get?!?
>>
>>
>> On 2/7/08 4:48 AM, "Terry Dontje" <Terry.Dontje_at_[hidden]> wrote:
>>
>>> Torsten Hoefler wrote:
>>>> Hi Brian,
>>>>
>>>>> Let me start by reminding everyone that I have no vote, so this
>>>>> should
>>>>> probably be sent to /dev/null.
>>>>>
>>>> thanks for your comment and this will not go to /dev/null!
>>>>
>>>>
>>>>> I think Ralph raised some good points. I'd like to raise another.
>>>>>
>>>> yes [will reply to this in a separate thread]
>>>>
>>>>
>>>>> Does it make sense to bring LibNBC into the release at this point,
>>>>> given the current standardization process of non-blocking
>>>>> collectives?
>>>>>
>>>>> My feeling is no, based on the long term support costs. We had
>>>>> this
>>>>> problem with a function in LAM/MPI -- MPIL_SPAWN, I believe it
>>>>> was --
>>>>> that was almost but not quite MPI_COMM_SPAWN. It was added to
>>>>> allow
>>>>> spawn before the standard was finished for dynamics. The problem
>>>>> is,
>>>>> it wasn't quite MPI_COMM_SPAWN, so we were now stuck with yet
>>>>> another
>>>>> function to support (in a touchy piece of code) for infinity and
>>>>> beyond.
>>>>>
>>>>> I worry that we'll have the same with LibNBC -- a piece of code
>>>>> that
>>>>> solves an immediate problem (no non-blocking collectives in MPI)
>>>>> but
>>>>> will become a long-term support anchor. Since this is something
>>>>> we'll
>>>>> be encouraging users to write code to, it's not like support for
>>>>> mvapi, where we can just deprecate it and users won't really
>>>>> notice.
>>>>> It's one thing to tell them to update their cluster software
>>>>> stack --
>>>>> it's another to tell them to rewrite their applications.
>>>>>
>>>> I think this is a very good and valid point. However, I would like
>>>> to
>>>> deprecate the NBC_* things as soon as non-blocking collectives are a
>>>> part of the standard. Of course, this would probably need two minor
>>>> versions to "clean" the code-base, but this is (will be) our normal
>>>> procedure (just what happened to MVAPI).
>>>>
>>>>
>>> Though it doesn't seem to me that NBC is a slam dunk to get into
>>> the MPI
>>> spec and I could
>>> imagine it changing significantly due to someone elses opinion/needs.
>>>> And rewriting the user's application will not be that hard, it'll
>>>> mainly
>>>> be vim:%s/NBC_/MPI_/g. Even if we change the interface (e.g. add
>>>> tags or
>>>> decide to use the more limited split collective approach), this
>>>> task is
>>>> rather easy and can be automated easily. It's not a functionality
>>>> change, just an interface.
>>>>
>>>>
>>> Though if NBC is built by default for release builds I think that
>>> raises
>>> the bar saying that we
>>> OMPI believe this should be used by all of our users without any
>>> concerns that the API may
>>> change or it might have significant issues.
>>>
>>> On a similar track do you have any tests that validate the
>>> functionality/correctness of NBC
>>> that can be ran as a part of the MTT nightly tests?
>>>
>>> My opinion is I have no problem with NBC being merged in just that I
>>> don't think it should be
>>> built by default.
>>>
>>> --td
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>