Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [RFC] Non-blocking collectives (LibNBC) merge to trunk
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-02-14 16:31:49


I am NOT in favor of bringing LibNBC in the trunk. Some of my concerns
were already stated by Brian and Ralph, and the answers didn't clearly
address all my reticences. I don't want to bring the MPI 3 Forum
discussion on this mailing list, but I think that as long as there is
a lack of the smallest beginning of consensus on the MPI Forum we
should keep the LibNBC outside the main distribution.

Second: who really need these? Please ask them to request it over the
public mailing lists, and to share with us their needs and concerns.
This will emphasize the needs for such a feature not only for Open MPI
but for the MPI 3 as a whole. I have the chance to work surrounded by
math people. Not some math users, but the ones who design, maintain,
analyze, improve and work daily on some of the most used mathematical
libraries out there. And they never had any needs for non blocking
collective. Moreover, they state that most well designed algorithms
have very regular patterns that fit well with the blocking way the
collectives are designed today. Additionally, in the few cases where
they might use non blocking approaches, based on the current trend
toward multi-core what the MPI standard allow today if enough.

More on the practical side I doubt that we want to validate, maintain
and distribute something that will be useful to only a very limited
number of people.

Third: I wonder how the life cycle of this addition will be different
that the libnbc that we already have in the mca/coll. I guess IU is
maintaining the current coll/libnbc. However, is there anybody that
test it on a regular basis? MTT doesn't contain anything related to
libnbc. Is there anybody using it ?

Forth: It was claimed that the integration of LibNBC will not require
any modification of the Open MPI source, and that NBC_ will be the
prologue of these functions. Then it make perfectly sense to
distribute them as a separate library, isn't it ?

   Thanks,
     george.

On Feb 14, 2008, at 1:15 PM, Jeff Squyres wrote:

> So I don't think that we ever concluded this discussion/RFC.
>
> I am in favor of bringing in libnbc, given the qualifications below.
>
> Others?
>
>
> On Feb 8, 2008, at 12:16 PM, Jeff Squyres wrote:
>
>> Terry -- I reluctantly agree. :-) What I envision is not difficult
>> (a first cut/feature-lean version is probably only several hundred
>> lines of perl?), but I don't have the cycles (at present) to
>> implement
>> it -- my priorities are elsewhere at the moment.
>>
>> If anyone is interested in this, I would gladly talk them through
>> what
>> [I think] needs to be done.
>>
>> That being said, for NBC, per Terry's points:
>>
>> - if it's not compiled/installed by default
>> - if we can make a big enough red flag for users that it's an R&D
>> effort that is subject to change (perhaps 3'x5'?)
>>
>> Then I think it would not be a bad thing to include NBC. But then I
>> think we need to disallow any other contrib/ projects until someone
>> can find the cycles to implement a better solution (such as an
>> ompi_contrib executable/system).
>>
>>
>>
>> On Feb 7, 2008, at 1:18 PM, Terry Dontje wrote:
>>
>>> Jeff, the below sounds good if we really believe there is going to
>>> be a
>>> whole bunch of addons. I am not sure NBC really constitute as an
>>> addon
>>> than more some research work that might become an official API.
>>> So I
>>> look at the NBC stuff more like a BTL or PM that is in progress of
>>> being
>>> developed/refined for prime time. So would a new PM or BTL be added
>>> via
>>> ompi_contrib? I wouldn't think they would.
>>>
>>> The ompi_contrib sounds like a nice utility but I have feeling there
>>> are
>>> bigger fish to fry unless we really believe there will be a lot of
>>> addons that we will need to support.
>>>
>>> --td
>>>
>>> Jeff Squyres wrote:
>>>> All these comments are good. I confess that although I should
>>>> have, I
>>>> really did not previously consider the complexity of adding in N
>>>> contrib packages to OMPI.
>>>>
>>>> The goal of the contrib packages is to easily allow additional
>>>> functionality that is nicely integrated with Open MPI. An obvious
>>>> way
>>>> to do this is to include the code in the Open MPI tarball, but that
>>>> leads to the logistics and other issues that have been identified.
>>>>
>>>> Ralph proposes a good way around this. But what about going
>>>> farther
>>>> than that: what we if we offer a standardized set of hooks for
>>>> including contrib functionality *after* core OMPI has been
>>>> installed?
>>>> Yes, it's one more step after OMPI has been installed -- but if we
>>>> can
>>>> keep it as *one* step, perhaps the user onus is not that bad. Let
>>>> me
>>>> explain.
>>>>
>>>> Consider a new standalone executable: ompi_contrib. You would run
>>>> ompi_contrib to install and uninstall contrib functionality into
>>>> your
>>>> existing OMPI:
>>>>
>>>> ompi_contrib --install http://www.example.com/nbc/nbc-ompi-contrib.tar.gz
>>>> or ompi_contrib --install file:///home/htor/nbc-ompi-
>>>> contrib.tar.gz
>>>>
>>>> This will download NBC (if http), build it, and install it into the
>>>> current OMPI. It is likely that the nbc-ompi-contrib.tar.gz file
>>>> will
>>>> contain the real NBC tarball (or maybe just a reference to it?)
>>>> plus a
>>>> small number of hook/glue scripts for OMPI integration (perhaps
>>>> quite
>>>> similar to what is in the contrib/ tree [on the branch] today for
>>>> NBC?). Likewise, after NBC is installed into the local OMPI
>>>> installation, ompi_info should be able to show "nbc" as installed
>>>> contrib functionality. It then follows that we might be able to
>>>> do:
>>>>
>>>> ompi_contrib --uninstall nbc
>>>>
>>>> to uninstall contrib NBC from the local OMPI installation.
>>>>
>>>> This kind of approach would seem to have several benefits:
>>>>
>>>> - Keep a clear[er] distinction between core OMPI and contributed
>>>> packages.
>>>>
>>>> - Allow simple integration of MPI libraries, tools, and even
>>>> applications (!) (think: numerical libraries, boost C++ libraries,
>>>> etc. -- how many of your users install additional tools on top of
>>>> MPI
>>>> incorrectly?). Anything
>>>>
>>>> - Allow 3rd parties to have "contrib" code to Open MPI without
>>>> needing
>>>> to get into our code tree (and sign the 3rd party agreements,
>>>> etc.),
>>>> keeping our distribution size down, avoiding release schedule
>>>> logistical issues, keeping our "core" build time down, etc.
>>>>
>>>> - Allow integration of contrib functionality at both a per-user and
>>>> system-wide basis.
>>>>
>>>> What I'm really proposing here is that OMPI becomes a system that
>>>> can
>>>> have additional functionality installed / uninstalled. Based on
>>>> the
>>>> infrastructure that we already have, this is not as much of a
>>>> stretch
>>>> as one would think.
>>>>
>>>> Comments?
>>>>
>>>> ("who's going to write this" is a question that will also have to
>>>> be
>>>> answered, but perhaps we can discuss the code concept/idea
>>>> first...)
>>>>
>>>>
>>>>
>>>> On Feb 7, 2008, at 10:11 AM, Ralph H Castain wrote:
>>>>
>>>>
>>>>> I believe Brian and Terry raise good points. May I offer a
>>>>> possible
>>>>> alternative? What if we only include in Open MPI an include file
>>>>> that
>>>>> contains the "hooks" to libNBC, and have the build system only
>>>>> "see"
>>>>> those
>>>>> if someone specifies --with-NBC (or whatever option name you
>>>>> like).
>>>>> If you
>>>>> like, you can make the inclusion automatic if libNBC is detected
>>>>> on
>>>>> the
>>>>> system. It would make sense to also add -libNBC to the mpicc et al
>>>>> wrappers
>>>>> as well when the build system includes the function definitions.
>>>>>
>>>>> This would allow those users that want (or can) to use that
>>>>> library
>>>>> link
>>>>> against it, without adding a bunch of source code to our
>>>>> release. I
>>>>> suspect
>>>>> there are complications that will have to be dealt with, but offer
>>>>> it as
>>>>> something to consider.
>>>>>
>>>>>
>>>>> Also, remember that there is an added burden when we add source
>>>>> code
>>>>> to Open
>>>>> MPI that we haven't discussed - we are now adding coordination
>>>>> issues to our
>>>>> own release cycle. If libNBC changes, are we now going to be
>>>>> pressed
>>>>> to
>>>>> issue another OMPI release so that the new NBC version is
>>>>> included?
>>>>> Do we
>>>>> now need to coordinate our releases with theirs so that things
>>>>> align?
>>>>>
>>>>> And if we have an increasing number of such "included" packages,
>>>>> how
>>>>> complex
>>>>> is -that- release discussion going to get?!?
>>>>>
>>>>>
>>>>> On 2/7/08 4:48 AM, "Terry Dontje" <Terry.Dontje_at_[hidden]> wrote:
>>>>>
>>>>>
>>>>>> Torsten Hoefler wrote:
>>>>>>
>>>>>>> Hi Brian,
>>>>>>>
>>>>>>>
>>>>>>>> Let me start by reminding everyone that I have no vote, so this
>>>>>>>> should
>>>>>>>> probably be sent to /dev/null.
>>>>>>>>
>>>>>>>>
>>>>>>> thanks for your comment and this will not go to /dev/null!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I think Ralph raised some good points. I'd like to raise
>>>>>>>> another.
>>>>>>>>
>>>>>>>>
>>>>>>> yes [will reply to this in a separate thread]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Does it make sense to bring LibNBC into the release at this
>>>>>>>> point,
>>>>>>>> given the current standardization process of non-blocking
>>>>>>>> collectives?
>>>>>>>>
>>>>>>>> My feeling is no, based on the long term support costs. We had
>>>>>>>> this
>>>>>>>> problem with a function in LAM/MPI -- MPIL_SPAWN, I believe it
>>>>>>>> was --
>>>>>>>> that was almost but not quite MPI_COMM_SPAWN. It was added to
>>>>>>>> allow
>>>>>>>> spawn before the standard was finished for dynamics. The
>>>>>>>> problem
>>>>>>>> is,
>>>>>>>> it wasn't quite MPI_COMM_SPAWN, so we were now stuck with yet
>>>>>>>> another
>>>>>>>> function to support (in a touchy piece of code) for infinity
>>>>>>>> and
>>>>>>>> beyond.
>>>>>>>>
>>>>>>>> I worry that we'll have the same with LibNBC -- a piece of code
>>>>>>>> that
>>>>>>>> solves an immediate problem (no non-blocking collectives in
>>>>>>>> MPI)
>>>>>>>> but
>>>>>>>> will become a long-term support anchor. Since this is
>>>>>>>> something
>>>>>>>> we'll
>>>>>>>> be encouraging users to write code to, it's not like support
>>>>>>>> for
>>>>>>>> mvapi, where we can just deprecate it and users won't really
>>>>>>>> notice.
>>>>>>>> It's one thing to tell them to update their cluster software
>>>>>>>> stack --
>>>>>>>> it's another to tell them to rewrite their applications.
>>>>>>>>
>>>>>>>>
>>>>>>> I think this is a very good and valid point. However, I would
>>>>>>> like
>>>>>>> to
>>>>>>> deprecate the NBC_* things as soon as non-blocking collectives
>>>>>>> are a
>>>>>>> part of the standard. Of course, this would probably need two
>>>>>>> minor
>>>>>>> versions to "clean" the code-base, but this is (will be) our
>>>>>>> normal
>>>>>>> procedure (just what happened to MVAPI).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Though it doesn't seem to me that NBC is a slam dunk to get into
>>>>>> the MPI
>>>>>> spec and I could
>>>>>> imagine it changing significantly due to someone elses opinion/
>>>>>> needs.
>>>>>>
>>>>>>> And rewriting the user's application will not be that hard,
>>>>>>> it'll
>>>>>>> mainly
>>>>>>> be vim:%s/NBC_/MPI_/g. Even if we change the interface (e.g. add
>>>>>>> tags or
>>>>>>> decide to use the more limited split collective approach), this
>>>>>>> task is
>>>>>>> rather easy and can be automated easily. It's not a
>>>>>>> functionality
>>>>>>> change, just an interface.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Though if NBC is built by default for release builds I think that
>>>>>> raises
>>>>>> the bar saying that we
>>>>>> OMPI believe this should be used by all of our users without any
>>>>>> concerns that the API may
>>>>>> change or it might have significant issues.
>>>>>>
>>>>>> On a similar track do you have any tests that validate the
>>>>>> functionality/correctness of NBC
>>>>>> that can be ran as a part of the MTT nightly tests?
>>>>>>
>>>>>> My opinion is I have no problem with NBC being merged in just
>>>>>> that I
>>>>>> don't think it should be
>>>>>> built by default.
>>>>>>
>>>>>> --td
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



  • application/pkcs7-signature attachment: smime.p7s