Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [RFC] Non-blocking collectives (LibNBC) merge to trunk
From: Shipman, Galen M. (gshipman_at_[hidden])
Date: 2008-02-14 13:21:43


I am in favor of bringing this in.

- Galen

On Feb 14, 2008, at 1:15 PM, Jeff Squyres wrote:

> So I don't think that we ever concluded this discussion/RFC.
>
> I am in favor of bringing in libnbc, given the qualifications below.
>
> Others?
>
>
> On Feb 8, 2008, at 12:16 PM, Jeff Squyres wrote:
>
>> Terry -- I reluctantly agree. :-) What I envision is not difficult
>> (a first cut/feature-lean version is probably only several hundred
>> lines of perl?), but I don't have the cycles (at present) to
>> implement
>> it -- my priorities are elsewhere at the moment.
>>
>> If anyone is interested in this, I would gladly talk them through
>> what
>> [I think] needs to be done.
>>
>> That being said, for NBC, per Terry's points:
>>
>> - if it's not compiled/installed by default
>> - if we can make a big enough red flag for users that it's an R&D
>> effort that is subject to change (perhaps 3'x5'?)
>>
>> Then I think it would not be a bad thing to include NBC. But then I
>> think we need to disallow any other contrib/ projects until someone
>> can find the cycles to implement a better solution (such as an
>> ompi_contrib executable/system).
>>
>>
>>
>> On Feb 7, 2008, at 1:18 PM, Terry Dontje wrote:
>>
>>> Jeff, the below sounds good if we really believe there is going to
>>> be a
>>> whole bunch of addons. I am not sure NBC really constitute as an
>>> addon
>>> than more some research work that might become an official API.
>>> So I
>>> look at the NBC stuff more like a BTL or PM that is in progress of
>>> being
>>> developed/refined for prime time. So would a new PM or BTL be added
>>> via
>>> ompi_contrib? I wouldn't think they would.
>>>
>>> The ompi_contrib sounds like a nice utility but I have feeling there
>>> are
>>> bigger fish to fry unless we really believe there will be a lot of
>>> addons that we will need to support.
>>>
>>> --td
>>>
>>> Jeff Squyres wrote:
>>>> All these comments are good. I confess that although I should
>>>> have, I
>>>> really did not previously consider the complexity of adding in N
>>>> contrib packages to OMPI.
>>>>
>>>> The goal of the contrib packages is to easily allow additional
>>>> functionality that is nicely integrated with Open MPI. An obvious
>>>> way
>>>> to do this is to include the code in the Open MPI tarball, but that
>>>> leads to the logistics and other issues that have been identified.
>>>>
>>>> Ralph proposes a good way around this. But what about going
>>>> farther
>>>> than that: what we if we offer a standardized set of hooks for
>>>> including contrib functionality *after* core OMPI has been
>>>> installed?
>>>> Yes, it's one more step after OMPI has been installed -- but if we
>>>> can
>>>> keep it as *one* step, perhaps the user onus is not that bad. Let
>>>> me
>>>> explain.
>>>>
>>>> Consider a new standalone executable: ompi_contrib. You would run
>>>> ompi_contrib to install and uninstall contrib functionality into
>>>> your
>>>> existing OMPI:
>>>>
>>>> ompi_contrib --install http://www.example.com/nbc/nbc-ompi-
>>>> contrib.tar.gz
>>>> or ompi_contrib --install file:///home/htor/nbc-ompi-
>>>> contrib.tar.gz
>>>>
>>>> This will download NBC (if http), build it, and install it into the
>>>> current OMPI. It is likely that the nbc-ompi-contrib.tar.gz file
>>>> will
>>>> contain the real NBC tarball (or maybe just a reference to it?)
>>>> plus a
>>>> small number of hook/glue scripts for OMPI integration (perhaps
>>>> quite
>>>> similar to what is in the contrib/ tree [on the branch] today for
>>>> NBC?). Likewise, after NBC is installed into the local OMPI
>>>> installation, ompi_info should be able to show "nbc" as installed
>>>> contrib functionality. It then follows that we might be able to
>>>> do:
>>>>
>>>> ompi_contrib --uninstall nbc
>>>>
>>>> to uninstall contrib NBC from the local OMPI installation.
>>>>
>>>> This kind of approach would seem to have several benefits:
>>>>
>>>> - Keep a clear[er] distinction between core OMPI and contributed
>>>> packages.
>>>>
>>>> - Allow simple integration of MPI libraries, tools, and even
>>>> applications (!) (think: numerical libraries, boost C++ libraries,
>>>> etc. -- how many of your users install additional tools on top of
>>>> MPI
>>>> incorrectly?). Anything
>>>>
>>>> - Allow 3rd parties to have "contrib" code to Open MPI without
>>>> needing
>>>> to get into our code tree (and sign the 3rd party agreements,
>>>> etc.),
>>>> keeping our distribution size down, avoiding release schedule
>>>> logistical issues, keeping our "core" build time down, etc.
>>>>
>>>> - Allow integration of contrib functionality at both a per-user and
>>>> system-wide basis.
>>>>
>>>> What I'm really proposing here is that OMPI becomes a system that
>>>> can
>>>> have additional functionality installed / uninstalled. Based on
>>>> the
>>>> infrastructure that we already have, this is not as much of a
>>>> stretch
>>>> as one would think.
>>>>
>>>> Comments?
>>>>
>>>> ("who's going to write this" is a question that will also have
>>>> to be
>>>> answered, but perhaps we can discuss the code concept/idea
>>>> first...)
>>>>
>>>>
>>>>
>>>> On Feb 7, 2008, at 10:11 AM, Ralph H Castain wrote:
>>>>
>>>>
>>>>> I believe Brian and Terry raise good points. May I offer a
>>>>> possible
>>>>> alternative? What if we only include in Open MPI an include file
>>>>> that
>>>>> contains the "hooks" to libNBC, and have the build system only
>>>>> "see"
>>>>> those
>>>>> if someone specifies --with-NBC (or whatever option name you
>>>>> like).
>>>>> If you
>>>>> like, you can make the inclusion automatic if libNBC is
>>>>> detected on
>>>>> the
>>>>> system. It would make sense to also add -libNBC to the mpicc et al
>>>>> wrappers
>>>>> as well when the build system includes the function definitions.
>>>>>
>>>>> This would allow those users that want (or can) to use that
>>>>> library
>>>>> link
>>>>> against it, without adding a bunch of source code to our
>>>>> release. I
>>>>> suspect
>>>>> there are complications that will have to be dealt with, but offer
>>>>> it as
>>>>> something to consider.
>>>>>
>>>>>
>>>>> Also, remember that there is an added burden when we add source
>>>>> code
>>>>> to Open
>>>>> MPI that we haven't discussed - we are now adding coordination
>>>>> issues to our
>>>>> own release cycle. If libNBC changes, are we now going to be
>>>>> pressed
>>>>> to
>>>>> issue another OMPI release so that the new NBC version is
>>>>> included?
>>>>> Do we
>>>>> now need to coordinate our releases with theirs so that things
>>>>> align?
>>>>>
>>>>> And if we have an increasing number of such "included" packages,
>>>>> how
>>>>> complex
>>>>> is -that- release discussion going to get?!?
>>>>>
>>>>>
>>>>> On 2/7/08 4:48 AM, "Terry Dontje" <Terry.Dontje_at_[hidden]> wrote:
>>>>>
>>>>>
>>>>>> Torsten Hoefler wrote:
>>>>>>
>>>>>>> Hi Brian,
>>>>>>>
>>>>>>>
>>>>>>>> Let me start by reminding everyone that I have no vote, so this
>>>>>>>> should
>>>>>>>> probably be sent to /dev/null.
>>>>>>>>
>>>>>>>>
>>>>>>> thanks for your comment and this will not go to /dev/null!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I think Ralph raised some good points. I'd like to raise
>>>>>>>> another.
>>>>>>>>
>>>>>>>>
>>>>>>> yes [will reply to this in a separate thread]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Does it make sense to bring LibNBC into the release at this
>>>>>>>> point,
>>>>>>>> given the current standardization process of non-blocking
>>>>>>>> collectives?
>>>>>>>>
>>>>>>>> My feeling is no, based on the long term support costs. We had
>>>>>>>> this
>>>>>>>> problem with a function in LAM/MPI -- MPIL_SPAWN, I believe it
>>>>>>>> was --
>>>>>>>> that was almost but not quite MPI_COMM_SPAWN. It was added to
>>>>>>>> allow
>>>>>>>> spawn before the standard was finished for dynamics. The
>>>>>>>> problem
>>>>>>>> is,
>>>>>>>> it wasn't quite MPI_COMM_SPAWN, so we were now stuck with yet
>>>>>>>> another
>>>>>>>> function to support (in a touchy piece of code) for infinity
>>>>>>>> and
>>>>>>>> beyond.
>>>>>>>>
>>>>>>>> I worry that we'll have the same with LibNBC -- a piece of code
>>>>>>>> that
>>>>>>>> solves an immediate problem (no non-blocking collectives in
>>>>>>>> MPI)
>>>>>>>> but
>>>>>>>> will become a long-term support anchor. Since this is
>>>>>>>> something
>>>>>>>> we'll
>>>>>>>> be encouraging users to write code to, it's not like support
>>>>>>>> for
>>>>>>>> mvapi, where we can just deprecate it and users won't really
>>>>>>>> notice.
>>>>>>>> It's one thing to tell them to update their cluster software
>>>>>>>> stack --
>>>>>>>> it's another to tell them to rewrite their applications.
>>>>>>>>
>>>>>>>>
>>>>>>> I think this is a very good and valid point. However, I would
>>>>>>> like
>>>>>>> to
>>>>>>> deprecate the NBC_* things as soon as non-blocking collectives
>>>>>>> are a
>>>>>>> part of the standard. Of course, this would probably need two
>>>>>>> minor
>>>>>>> versions to "clean" the code-base, but this is (will be) our
>>>>>>> normal
>>>>>>> procedure (just what happened to MVAPI).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Though it doesn't seem to me that NBC is a slam dunk to get into
>>>>>> the MPI
>>>>>> spec and I could
>>>>>> imagine it changing significantly due to someone elses opinion/
>>>>>> needs.
>>>>>>
>>>>>>> And rewriting the user's application will not be that hard,
>>>>>>> it'll
>>>>>>> mainly
>>>>>>> be vim:%s/NBC_/MPI_/g. Even if we change the interface (e.g. add
>>>>>>> tags or
>>>>>>> decide to use the more limited split collective approach), this
>>>>>>> task is
>>>>>>> rather easy and can be automated easily. It's not a
>>>>>>> functionality
>>>>>>> change, just an interface.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Though if NBC is built by default for release builds I think that
>>>>>> raises
>>>>>> the bar saying that we
>>>>>> OMPI believe this should be used by all of our users without any
>>>>>> concerns that the API may
>>>>>> change or it might have significant issues.
>>>>>>
>>>>>> On a similar track do you have any tests that validate the
>>>>>> functionality/correctness of NBC
>>>>>> that can be ran as a part of the MTT nightly tests?
>>>>>>
>>>>>> My opinion is I have no problem with NBC being merged in just
>>>>>> that I
>>>>>> don't think it should be
>>>>>> built by default.
>>>>>>
>>>>>> --td
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel