Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] pointer_array
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-12-21 15:20:57


Fixed in https://svn.open-mpi.org/trac/ompi/changeset/17021 -- a
simple off-by-one error caused by some logic moving around.

On Dec 21, 2007, at 12:19 PM, Jeff Squyres wrote:

> I'm unfortunately getting some test failures after r17007 when
> converting between C and C++/F77 handles, such as the following (I
> checked; trunk/r17006 works fine):
>
> -----
> [9:17] svbu-mpi:~/svn/ompi-tests/cxx-test-suite % mpirun -np 4 src/
> mpi2c++_test
>
> Since we made it this far, we will assume that
> MPI::Init() worked properly.
> ----------------------------------------------
> ...snipped....
> * MPI::misc...
> - MPI::Query_thread... PASS
> - Commname... PASS
> - Commgethandler... PASS
> - Handle conversion... mpi2c++_test: class/
> opal_pointer_array.c:131: opal_pointer_array_add: Assertion `table-
>> addr[index] == ((void *)0)' failed.
> -----
>
> And other failures in the simple/basic/f90/attr_f test. I'll start
> poking around shortly to see if I can figure out the problem.
>
>
> On Dec 21, 2007, at 1:31 AM, George Bosilca wrote:
>
>> The opal_pointer_array is now committed. I try to update all BTL to
>> use the opal_pointer_array instead of the orte_pointer_array. Now,
>> the OMPI layer only use opal_pointer_array. Unfortunately, I cannot
>> test most of the BTLs so I hope I didn't miss anything.
>>
>> Thanks,
>> george.
>>
>> On Dec 17, 2007, at 2:22 PM, George Bosilca wrote:
>>
>>> Sound good. I will replace all references to ompi_pointer_array as
>>> well as orte_pointer_array in the ompi layer (some BTL use the
>>> orte_pointer_array) and replace them with the opal_pointer_array.
>>> I'll avoid any modification of the orte layer.
>>>
>>> I'll commit tomorrow morning.
>>>
>>> Thanks,
>>> george.
>>>
>>> On Dec 17, 2007, at 12:04 PM, Ralph H Castain wrote:
>>>
>>>> That would be fine with me - I can grab that out of the trunk and
>>>> adjust
>>>> ORTE in my branch instead.
>>>>
>>>> Thanks
>>>> Ralph
>>>>
>>>>
>>>> On 12/17/07 9:54 AM, "Tim Mattox" <timattox_at_[hidden]> wrote:
>>>>
>>>>> How about this as a suggested compromise.
>>>>> George, could you just do half the patch... where you leave orte
>>>>> alone,
>>>>> and just move the ompi pointer array implementation down into
>>>>> opal.
>>>>> That way, any new code can make use of it from opal, and only orte
>>>>> would need to be adjusted later, after Ralph is done with his
>>>>> changes.
>>>>>
>>>>> On Dec 17, 2007 9:18 AM, Ralph H Castain <rhc_at_[hidden]> wrote:
>>>>>> It would require extensive modification as use of the pointer
>>>>>> array has
>>>>>> spread over a wide range of the code base. I would really
>>>>>> appreciate it if
>>>>>> we didn't do this right now.
>>>>>>
>>>>>> The differences are historic in nature - several years ago, the
>>>>>> folks
>>>>>> working on the OMPI layer needed to insert some Fortran-specific
>>>>>> limits and
>>>>>> type definitions into the opal_pointer_array code.
>>>>>> Unfortunately, that
>>>>>> caused type conflicts across a swath of the ORTE code. After a
>>>>>> ton of
>>>>>> discussion and debate, there was no way the OMPI folks could
>>>>>> guarantee that
>>>>>> they wouldn't need to change those definitions again at some
>>>>>> time into the
>>>>>> future - which would again force the ORTE layer to make major
>>>>>> changes to
>>>>>> their code.
>>>>>>
>>>>>> In addition, the use of an int as the array index in the
>>>>>> opal_pointer_array
>>>>>> raised concerns in the ORTE world as we really didn't want to
>>>>>> pass generic
>>>>>> variable types between processes. At the time, we weren't sure
>>>>>> if the index
>>>>>> in a pointer array was going to need to be passed somewhere in
>>>>>> the future -
>>>>>> in fact, the code did pass it at the time in several cases.
>>>>>>
>>>>>> So we agreed to simply create separate code that, even though it
>>>>>> duplicated
>>>>>> the functionality, ensured that the two could operate semi-
>>>>>> independently.
>>>>>>
>>>>>> In the intervening time, the OMPI folks seem to have been able
>>>>>> to leave the
>>>>>> opal_pointer_array definitions pretty much alone. There have
>>>>>> been a few
>>>>>> changes along the way, but nothing overwhelming. In addition, we
>>>>>> have found
>>>>>> that the ORTE code no longer needs to pass the array index when
>>>>>> sending an
>>>>>> object's data to a remote process - at least, this is true at
>>>>>> the moment.
>>>>>>
>>>>>> So making the change might be reasonable. If we are going to do
>>>>>> that,
>>>>>> though, we need to ensure that all the functionality is
>>>>>> replicated (there
>>>>>> are, I believe, a couple of extensions in the orte_pointer_array
>>>>>> class), and
>>>>>> we should similarly review the other orte/opal class overlaps.
>>>>>>
>>>>>> However, doing all this right now would be a disaster on the tmp
>>>>>> branch
>>>>>> where we are revising ORTE. It would be much better to do it
>>>>>> after that
>>>>>> branch merges to the trunk, or just make the change in the tmp
>>>>>> branch first.
>>>>>> That branch makes much more extensive use of the
>>>>>> orte_pointer_array object
>>>>>> than is in the trunk, and it would be a royal pain of conflicts
>>>>>> to resolve
>>>>>> it - all for little, if any, gain.
>>>>>>
>>>>>> Thanks
>>>>>> Ralph
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12/17/07 6:35 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>>>>>>
>>>>>>> Adding RHC to the thread...
>>>>>>>
>>>>>>> I'm guessing that the patch will have to be modified for the
>>>>>>> ORTE tmp
>>>>>>> branch.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Dec 16, 2007, at 6:18 PM, George Bosilca wrote:
>>>>>>>
>>>>>>>> Right, I wonder why it didn't show in the patch file. Anyway,
>>>>>>>> it
>>>>>>>> completely remove the orte_pointer_array.[ch] as well as the
>>>>>>>> ompi_pointer_array.[ch] file.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> george.
>>>>>>>>
>>>>>>>> On Dec 16, 2007, at 12:01 AM, Tim Mattox wrote:
>>>>>>>>
>>>>>>>>> The patch looks good to my eyeballs, though I've not done any
>>>>>>>>> testing with it.
>>>>>>>>> I presume a follow on patch would remove the
>>>>>>>>> orte_pointer_array.
>>>>>>>>> [ch] files?
>>>>>>>>>
>>>>>>>>> On Dec 15, 2007 4:01 PM, George Bosilca
>>>>>>>>> <bosilca_at_[hidden]> wrote:
>>>>>>>>>> I have a patch that unify the pointer array implementations
>>>>>>>>>> into
>>>>>>>>>> just
>>>>>>>>>> one. Right now, we have 2 pointer array implementations: one
>>>>>>>>>> for
>>>>>>>>>> ORTE
>>>>>>>>>> and one for OMPI. The differences are small and mostly
>>>>>>>>>> insignificant
>>>>>>>>>> (there is no way to add more than 2^31 elements in the
>>>>>>>>>> pointer array
>>>>>>>>>> anyway). The following patch propose to merge these two
>>>>>>>>>> pointer
>>>>>>>>>> array
>>>>>>>>>> into one, implemented in OPAL (and called
>>>>>>>>>> opal_pointer_array).
>>>>>>>>>>
>>>>>>>>>> If nobody has complained before Wednesday noon I'll commit
>>>>>>>>>> the
>>>>>>>>>> patch.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> george.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
>>>>>>>>> tmattox_at_[hidden] || timattox_at_[hidden]
>>>>>>>>> I'm a bright... http://www.the-brights.net/
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems