Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Move of ompi_bitmap_t
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-02-01 07:37:33


I just looked through both opal_bitmap_t and ompi_bitmap_t and I think
that the only real difference is that in the ompi version, we check
(in various places) that the size of the bitmap never grows beyond
OMPI_FORTRAN_HANDLE_MAX; the opal version doesn't do these kind of
size checks.

I think it would be fairly straightforward to:

- add generic checks into the opal version, perhaps by adding a new
API call (opal_bitmap_set_max_size())
- if the max size has been set, then ensure that the bitmap never
grows beyond that size, otherwise let it have the same behavior as
today (grow without bound -- assumedly until malloc() fails)

It'll take a little care to ensure to merge the functionality
correctly, but it is possible. Once that is done, you can:

- remove the ompi_bitmap_t class
- s/ompi_bitmap/opal_bitmap/g in the OMPI layer
- add new calls to opal_bitmap_set_max_size(&bitmap,
OMPI_FORTRAN_HANDLE_MAX) in the OMPI layer (should only be in a few
places -- probably one for each MPI handle type...? It's been so long
since I've looked at that code that I don't remember offhand)

I'd generally be in favor of this because, although this is not a lot
of repeated code, it *is* repeated code -- so cleaning it up and
consolidating the non-Fortran stuff down in opal is not a Bad Thing.

On Jan 30, 2009, at 4:59 PM, Ralph Castain wrote:

> The history is simple. Originally, there was one bitmap_t in orte
> that was also used in ompi. Then the folks working on Fortran found
> that they had to put a limit in the bitmap code to avoid getting
> values outside of Fortran's range. However, this introduced a
> problem - if we had the limit in the orte version, then we limited
> ourselves unnecessarily, and introduced some abstraction questions
> since orte knows nothing about Fortran.
>
> So two were created. Then the orte_bitmap_t was blown away at a
> later time when we removed the GPR as George felt it wasn't
> necessary (which was true). It was later reborn when we needed it in
> the routed system, but this time it was done in opal as others
> indicated a potential more general use for that capability.
>
> The problem with uniting the two is that you either have to
> introduce Fortran-based limits into opal (which messes up the non-
> ompi uses), or deal with the Fortran limits in some other fashion.
> Neither is particularly pleasant, though it could be done.
>
> I think it primarily is a question for the Fortran folks to address
> - can they deal with Fortran limits in some other manner without
> making the code unmanageable and/or taking a performance hit?
>
> Ralph
>
>
> On Jan 30, 2009, at 2:40 PM, Richard Graham wrote:
>
>> This should really be viewed as a code maintenance RFC. The reason
>> this
>> came up in the first place is because we are investigating the btl
>> move, but
>> these are really two very distinct issues. There are two bits of
>> code that
>> have virtually the same functionality - they do have the same
>> interface I am
>> told. The question is, is there a good reason to keep two different
>> versions in the repository ? Not knowing the history of why a second
>> version was created this is an inquiry. Is there some performance
>> advantage, or some other advantage to having these two versions ?
>>
>> Rich
>>
>>
>> On 1/30/09 3:23 PM, "Terry D. Dontje" <Terry.Dontje_at_[hidden]> wrote:
>>
>>> I second Brian's concern. So unless this is just an announcement
>>> that
>>> this is being done on a tmp branch only until everything is in
>>> order I
>>> think we need further discussions.
>>>
>>> --td
>>>
>>> Brian Barrett wrote:
>>>> So once again, I bring up my objection of this entire line of
>>>> moving
>>>> until such time as the entire process is properly mapped out. I
>>>> believe it's premature to being moving around code in preparation
>>>> for
>>>> a move that hasn't been proven viable yet. Until there is concrete
>>>> evidence that such a move is possible, won't degrade application
>>>> performance, and does not make the code totally unmaintainable, I
>>>> believe that any related code changes should not be brought into
>>>> the
>>>> trunk.
>>>>
>>>> Brian
>>>>
>>>>
>>>> On Jan 30, 2009, at 12:30 PM, Rainer Keller wrote:
>>>>
>>>>> On behalf of Laurent Broto
>>>>>
>>>>> RFC: Move of ompi_bitmap_t
>>>>>
>>>>> WHAT: Move ompi_bitmap_t into opal or onet-layer
>>>>>
>>>>> WHY: Remove dependency on ompi-layer.
>>>>>
>>>>> WHERE: ompi/class
>>>>>
>>>>> WHEN: Open MPI-1.4
>>>>>
>>>>> TIMEOUT: February 3, 2009.
>>>>>
>>>>> -------------------------------------
>>>>> Details:
>>>>> WHY:
>>>>> The ompi_bitmap_t is being used in various places within
>>>>> opal/orte/ompi. With
>>>>> the proposed splitting of BTLs into a separate library, we are
>>>>> currently
>>>>> investigating several of the differences between ompi/class/* and
>>>>> opal/class/*
>>>>>
>>>>> One of the items is the ompi_bitmap_t which is quite similar to
>>>>> the
>>>>> opal_bitmap_t.
>>>>> The question is, whether we can remove favoring a solution just
>>>>> in opal.
>>>>>
>>>>> WHAT:
>>>>> The data structures in the opal-version are the same,
>>>>> so is the interface,
>>>>> the implementation is *almost* the same....
>>>>>
>>>>> The difference is the Fortran handles ;-]!
>>>>>
>>>>> Maybe we're missing something but could we have a discussion, on
>>>>> why
>>>>> Fortran
>>>>> sizes are playing a role here, and if this is a hard
>>>>> requirement, how
>>>>> we could
>>>>> settle that into that current interface (possibly without a
>>>>> notion of
>>>>> Fortran,
>>>>> but rather, set some upper limit that the bitmap may grow to?)
>>>>>
>>>>> With best regards,
>>>>> Laurent and Rainer
>>>>> --
>>>>> ------------------------------------------------------------------------
>>>>> Rainer Keller, PhD Tel: (865) 241-6293
>>>>> Oak Ridge National Lab Fax: (865) 241-4811
>>>>> PO Box 2008 MS 6164 Email: keller_at_[hidden]
>>>>> Oak Ridge, TN 37831-2008 AIM/Skype: rusraink
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems