Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Move of ompi_bitmap_t
From: Richard Graham (rlgraham_at_[hidden])
Date: 2009-02-01 16:57:58


Brian,
  Can you be a bit more specific about the work you did at LANL ?

Thanks,
Rich

On 2/1/09 2:20 PM, "Brian Barrett" <brbarret_at_[hidden]> wrote:

> While I would love to be involved in this change, as I believe it's
> critical it get done right and have some reservations based on the
> work we did while a bunch of us were still at LANL, I just don't have
> time for yet another weekly telecon (particularly since 2:00 MST is
> the same as an existing weekly telecon).
>
> I still think my objections stand, however. A weekly telecon to
> discuss the issues is no replacement for a detailed explanation of how
> things are going to work, as well as some proof of concept code. We
> should hold this change up to the same standard we hold all major
> changes to -- which means a working temp branch with negligible
> performance impact.
>
> Brian
>
> On Feb 1, 2009, at 12:14 PM, Graham, Richard L. wrote:
>
>> > Brian,
>> > Just fyi, there is a weekly call - thursdays at 4 est where we have
>> > been discussyng these issues.
>> > Let's touch base at the forum.
>> >
>> > Rich
>> >
>> > ----- Original Message -----
>> > From: devel-bounces_at_[hidden] <devel-bounces_at_[hidden]>
>> > To: Open MPI Developers <devel_at_[hidden]>
>> > Sent: Sun Feb 01 10:36:33 2009
>> > Subject: Re: [OMPI devel] RFC: Move of ompi_bitmap_t
>> >
>> > In that case, I remove my objection to this particular RFC. It
>> > remains for all other RFCs related to moving any of the BTL move code
>> > to the trunk before the critical issues with the BTL move have been
>> > sorted out in a temporary branch. This includes renaming functions
>> > and such. Perhaps we should have a discussion about those issues
>> > during the Forum in a couple weeks?
>> >
>> > Brian
>> >
>> > On Feb 1, 2009, at 5:37 AM, Jeff Squyres wrote:
>> >
>>> >> I just looked through both opal_bitmap_t and ompi_bitmap_t and I
>>> >> think that the only real difference is that in the ompi version, we
>>> >> check (in various places) that the size of the bitmap never grows
>>> >> beyond OMPI_FORTRAN_HANDLE_MAX; the opal version doesn't do these
>>> >> kind of size checks.
>>> >>
>>> >> I think it would be fairly straightforward to:
>>> >>
>>> >> - add generic checks into the opal version, perhaps by adding a new
>>> >> API call (opal_bitmap_set_max_size())
>>> >> - if the max size has been set, then ensure that the bitmap never
>>> >> grows beyond that size, otherwise let it have the same behavior as
>>> >> today (grow without bound -- assumedly until malloc() fails)
>>> >>
>>> >> It'll take a little care to ensure to merge the functionality
>>> >> correctly, but it is possible. Once that is done, you can:
>>> >>
>>> >> - remove the ompi_bitmap_t class
>>> >> - s/ompi_bitmap/opal_bitmap/g in the OMPI layer
>>> >> - add new calls to opal_bitmap_set_max_size(&bitmap,
>>> >> OMPI_FORTRAN_HANDLE_MAX) in the OMPI layer (should only be in a few
>>> >> places -- probably one for each MPI handle type...? It's been so
>>> >> long since I've looked at that code that I don't remember offhand)
>>> >>
>>> >> I'd generally be in favor of this because, although this is not a
>>> >> lot of repeated code, it *is* repeated code -- so cleaning it up and
>>> >> consolidating the non-Fortran stuff down in opal is not a Bad Thing.
>>> >>
>>> >>
>>> >> On Jan 30, 2009, at 4:59 PM, Ralph Castain wrote:
>>> >>
>>>> >>> The history is simple. Originally, there was one bitmap_t in orte
>>>> >>> that was also used in ompi. Then the folks working on Fortran found
>>>> >>> that they had to put a limit in the bitmap code to avoid getting
>>>> >>> values outside of Fortran's range. However, this introduced a
>>>> >>> problem - if we had the limit in the orte version, then we limited
>>>> >>> ourselves unnecessarily, and introduced some abstraction questions
>>>> >>> since orte knows nothing about Fortran.
>>>> >>>
>>>> >>> So two were created. Then the orte_bitmap_t was blown away at a
>>>> >>> later time when we removed the GPR as George felt it wasn't
>>>> >>> necessary (which was true). It was later reborn when we needed it
>>>> >>> in the routed system, but this time it was done in opal as others
>>>> >>> indicated a potential more general use for that capability.
>>>> >>>
>>>> >>> The problem with uniting the two is that you either have to
>>>> >>> introduce Fortran-based limits into opal (which messes up the non-
>>>> >>> ompi uses), or deal with the Fortran limits in some other fashion.
>>>> >>> Neither is particularly pleasant, though it could be done.
>>>> >>>
>>>> >>> I think it primarily is a question for the Fortran folks to address
>>>> >>> - can they deal with Fortran limits in some other manner without
>>>> >>> making the code unmanageable and/or taking a performance hit?
>>>> >>>
>>>> >>> Ralph
>>>> >>>
>>>> >>>
>>>> >>> On Jan 30, 2009, at 2:40 PM, Richard Graham wrote:
>>>> >>>
>>>>> >>>> This should really be viewed as a code maintenance RFC. The
>>>>> >>>> reason this
>>>>> >>>> came up in the first place is because we are investigating the btl
>>>>> >>>> move, but
>>>>> >>>> these are really two very distinct issues. There are two bits of
>>>>> >>>> code that
>>>>> >>>> have virtually the same functionality - they do have the same
>>>>> >>>> interface I am
>>>>> >>>> told. The question is, is there a good reason to keep two
>>>>> >>>> different
>>>>> >>>> versions in the repository ? Not knowing the history of why a
>>>>> >>>> second
>>>>> >>>> version was created this is an inquiry. Is there some performance
>>>>> >>>> advantage, or some other advantage to having these two versions ?
>>>>> >>>>
>>>>> >>>> Rich
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On 1/30/09 3:23 PM, "Terry D. Dontje" <Terry.Dontje_at_[hidden]> wrote:
>>>>> >>>>
>>>>>> >>>>> I second Brian's concern. So unless this is just an announcement
>>>>>> >>>>> that
>>>>>> >>>>> this is being done on a tmp branch only until everything is in
>>>>>> >>>>> order I
>>>>>> >>>>> think we need further discussions.
>>>>>> >>>>>
>>>>>> >>>>> --td
>>>>>> >>>>>
>>>>>> >>>>> Brian Barrett wrote:
>>>>>>> >>>>>> So once again, I bring up my objection of this entire line of
>>>>>>> >>>>>> moving
>>>>>>> >>>>>> until such time as the entire process is properly mapped out. I
>>>>>>> >>>>>> believe it's premature to being moving around code in
>>>>>>> >>>>>> preparation for
>>>>>>> >>>>>> a move that hasn't been proven viable yet. Until there is
>>>>>>> >>>>>> concrete
>>>>>>> >>>>>> evidence that such a move is possible, won't degrade application
>>>>>>> >>>>>> performance, and does not make the code totally unmaintainable, I
>>>>>>> >>>>>> believe that any related code changes should not be brought into
>>>>>>> >>>>>> the
>>>>>>> >>>>>> trunk.
>>>>>>> >>>>>>
>>>>>>> >>>>>> Brian
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> On Jan 30, 2009, at 12:30 PM, Rainer Keller wrote:
>>>>>>> >>>>>>
>>>>>>>> >>>>>>> On behalf of Laurent Broto
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> RFC: Move of ompi_bitmap_t
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> WHAT: Move ompi_bitmap_t into opal or onet-layer
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> WHY: Remove dependency on ompi-layer.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> WHERE: ompi/class
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> WHEN: Open MPI-1.4
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> TIMEOUT: February 3, 2009.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> -------------------------------------
>>>>>>>> >>>>>>> Details:
>>>>>>>> >>>>>>> WHY:
>>>>>>>> >>>>>>> The ompi_bitmap_t is being used in various places within
>>>>>>>> >>>>>>> opal/orte/ompi. With
>>>>>>>> >>>>>>> the proposed splitting of BTLs into a separate library, we are
>>>>>>>> >>>>>>> currently
>>>>>>>> >>>>>>> investigating several of the differences between ompi/class/*
>>>>>>>> >>>>>>> and
>>>>>>>> >>>>>>> opal/class/*
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> One of the items is the ompi_bitmap_t which is quite similar to
>>>>>>>> >>>>>>> the
>>>>>>>> >>>>>>> opal_bitmap_t.
>>>>>>>> >>>>>>> The question is, whether we can remove favoring a solution just
>>>>>>>> >>>>>>> in opal.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> WHAT:
>>>>>>>> >>>>>>> The data structures in the opal-version are the same,
>>>>>>>> >>>>>>> so is the interface,
>>>>>>>> >>>>>>> the implementation is *almost* the same....
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> The difference is the Fortran handles ;-]!
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> Maybe we're missing something but could we have a discussion,
>>>>>>>> >>>>>>> on why
>>>>>>>> >>>>>>> Fortran
>>>>>>>> >>>>>>> sizes are playing a role here, and if this is a hard
>>>>>>>> >>>>>>> requirement, how
>>>>>>>> >>>>>>> we could
>>>>>>>> >>>>>>> settle that into that current interface (possibly without a
>>>>>>>> >>>>>>> notion of
>>>>>>>> >>>>>>> Fortran,
>>>>>>>> >>>>>>> but rather, set some upper limit that the bitmap may grow to?)
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> With best regards,
>>>>>>>> >>>>>>> Laurent and Rainer
>>>>>>>> >>>>>>> --
>>>>>>>> >>>>>>>
>>>>>>>>
------------------------------------------------------------------------
>>>>>>>> >>>>>>> Rainer Keller, PhD Tel: (865) 241-6293
>>>>>>>> >>>>>>> Oak Ridge National Lab Fax: (865) 241-4811
>>>>>>>> >>>>>>> PO Box 2008 MS 6164 Email: keller_at_[hidden]
>>>>>>>> >>>>>>> Oak Ridge, TN 37831-2008 AIM/Skype: rusraink
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> _______________________________________________
>>>>>>>> >>>>>>> devel mailing list
>>>>>>>> >>>>>>> devel_at_[hidden]
>>>>>>>> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> >>>>>>>
>>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>> _______________________________________________
>>>>>> >>>>> devel mailing list
>>>>>> >>>>> devel_at_[hidden]
>>>>>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> >>>>
>>>>> >>>> _______________________________________________
>>>>> >>>> devel mailing list
>>>>> >>>> devel_at_[hidden]
>>>>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> devel mailing list
>>>> >>> devel_at_[hidden]
>>>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >>
>>> >>
>>> >> --
>>> >> Jeff Squyres
>>> >> Cisco Systems
>>> >>
>>> >> _______________________________________________
>>> >> devel mailing list
>>> >> devel_at_[hidden]
>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >>
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >
>
> --
> Brian Barrett
> Open MPI developer
> http://www.open-mpi.org/
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>