Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] pointer_array
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-12-21 12:19:37


I'm unfortunately getting some test failures after r17007 when
converting between C and C++/F77 handles, such as the following (I
checked; trunk/r17006 works fine):

-----
[9:17] svbu-mpi:~/svn/ompi-tests/cxx-test-suite % mpirun -np 4 src/
mpi2c++_test

Since we made it this far, we will assume that
MPI::Init() worked properly.
----------------------------------------------
...snipped....
* MPI::misc...
   - MPI::Query_thread... PASS
   - Commname... PASS
   - Commgethandler... PASS
   - Handle conversion... mpi2c++_test: class/
opal_pointer_array.c:131: opal_pointer_array_add: Assertion `table-
>addr[index] == ((void *)0)' failed.
-----

And other failures in the simple/basic/f90/attr_f test. I'll start
poking around shortly to see if I can figure out the problem.

On Dec 21, 2007, at 1:31 AM, George Bosilca wrote:

> The opal_pointer_array is now committed. I try to update all BTL to
> use the opal_pointer_array instead of the orte_pointer_array. Now,
> the OMPI layer only use opal_pointer_array. Unfortunately, I cannot
> test most of the BTLs so I hope I didn't miss anything.
>
> Thanks,
> george.
>
> On Dec 17, 2007, at 2:22 PM, George Bosilca wrote:
>
>> Sound good. I will replace all references to ompi_pointer_array as
>> well as orte_pointer_array in the ompi layer (some BTL use the
>> orte_pointer_array) and replace them with the opal_pointer_array.
>> I'll avoid any modification of the orte layer.
>>
>> I'll commit tomorrow morning.
>>
>> Thanks,
>> george.
>>
>> On Dec 17, 2007, at 12:04 PM, Ralph H Castain wrote:
>>
>>> That would be fine with me - I can grab that out of the trunk and
>>> adjust
>>> ORTE in my branch instead.
>>>
>>> Thanks
>>> Ralph
>>>
>>>
>>> On 12/17/07 9:54 AM, "Tim Mattox" <timattox_at_[hidden]> wrote:
>>>
>>>> How about this as a suggested compromise.
>>>> George, could you just do half the patch... where you leave orte
>>>> alone,
>>>> and just move the ompi pointer array implementation down into opal.
>>>> That way, any new code can make use of it from opal, and only orte
>>>> would need to be adjusted later, after Ralph is done with his
>>>> changes.
>>>>
>>>> On Dec 17, 2007 9:18 AM, Ralph H Castain <rhc_at_[hidden]> wrote:
>>>>> It would require extensive modification as use of the pointer
>>>>> array has
>>>>> spread over a wide range of the code base. I would really
>>>>> appreciate it if
>>>>> we didn't do this right now.
>>>>>
>>>>> The differences are historic in nature - several years ago, the
>>>>> folks
>>>>> working on the OMPI layer needed to insert some Fortran-specific
>>>>> limits and
>>>>> type definitions into the opal_pointer_array code.
>>>>> Unfortunately, that
>>>>> caused type conflicts across a swath of the ORTE code. After a
>>>>> ton of
>>>>> discussion and debate, there was no way the OMPI folks could
>>>>> guarantee that
>>>>> they wouldn't need to change those definitions again at some
>>>>> time into the
>>>>> future - which would again force the ORTE layer to make major
>>>>> changes to
>>>>> their code.
>>>>>
>>>>> In addition, the use of an int as the array index in the
>>>>> opal_pointer_array
>>>>> raised concerns in the ORTE world as we really didn't want to
>>>>> pass generic
>>>>> variable types between processes. At the time, we weren't sure
>>>>> if the index
>>>>> in a pointer array was going to need to be passed somewhere in
>>>>> the future -
>>>>> in fact, the code did pass it at the time in several cases.
>>>>>
>>>>> So we agreed to simply create separate code that, even though it
>>>>> duplicated
>>>>> the functionality, ensured that the two could operate semi-
>>>>> independently.
>>>>>
>>>>> In the intervening time, the OMPI folks seem to have been able
>>>>> to leave the
>>>>> opal_pointer_array definitions pretty much alone. There have
>>>>> been a few
>>>>> changes along the way, but nothing overwhelming. In addition, we
>>>>> have found
>>>>> that the ORTE code no longer needs to pass the array index when
>>>>> sending an
>>>>> object's data to a remote process - at least, this is true at
>>>>> the moment.
>>>>>
>>>>> So making the change might be reasonable. If we are going to do
>>>>> that,
>>>>> though, we need to ensure that all the functionality is
>>>>> replicated (there
>>>>> are, I believe, a couple of extensions in the orte_pointer_array
>>>>> class), and
>>>>> we should similarly review the other orte/opal class overlaps.
>>>>>
>>>>> However, doing all this right now would be a disaster on the tmp
>>>>> branch
>>>>> where we are revising ORTE. It would be much better to do it
>>>>> after that
>>>>> branch merges to the trunk, or just make the change in the tmp
>>>>> branch first.
>>>>> That branch makes much more extensive use of the
>>>>> orte_pointer_array object
>>>>> than is in the trunk, and it would be a royal pain of conflicts
>>>>> to resolve
>>>>> it - all for little, if any, gain.
>>>>>
>>>>> Thanks
>>>>> Ralph
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 12/17/07 6:35 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>>>>>
>>>>>> Adding RHC to the thread...
>>>>>>
>>>>>> I'm guessing that the patch will have to be modified for the
>>>>>> ORTE tmp
>>>>>> branch.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Dec 16, 2007, at 6:18 PM, George Bosilca wrote:
>>>>>>
>>>>>>> Right, I wonder why it didn't show in the patch file. Anyway, it
>>>>>>> completely remove the orte_pointer_array.[ch] as well as the
>>>>>>> ompi_pointer_array.[ch] file.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> george.
>>>>>>>
>>>>>>> On Dec 16, 2007, at 12:01 AM, Tim Mattox wrote:
>>>>>>>
>>>>>>>> The patch looks good to my eyeballs, though I've not done any
>>>>>>>> testing with it.
>>>>>>>> I presume a follow on patch would remove the
>>>>>>>> orte_pointer_array.
>>>>>>>> [ch] files?
>>>>>>>>
>>>>>>>> On Dec 15, 2007 4:01 PM, George Bosilca
>>>>>>>> <bosilca_at_[hidden]> wrote:
>>>>>>>>> I have a patch that unify the pointer array implementations
>>>>>>>>> into
>>>>>>>>> just
>>>>>>>>> one. Right now, we have 2 pointer array implementations: one
>>>>>>>>> for
>>>>>>>>> ORTE
>>>>>>>>> and one for OMPI. The differences are small and mostly
>>>>>>>>> insignificant
>>>>>>>>> (there is no way to add more than 2^31 elements in the
>>>>>>>>> pointer array
>>>>>>>>> anyway). The following patch propose to merge these two
>>>>>>>>> pointer
>>>>>>>>> array
>>>>>>>>> into one, implemented in OPAL (and called opal_pointer_array).
>>>>>>>>>
>>>>>>>>> If nobody has complained before Wednesday noon I'll commit the
>>>>>>>>> patch.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> george.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
>>>>>>>> tmattox_at_[hidden] || timattox_at_[hidden]
>>>>>>>> I'm a bright... http://www.the-brights.net/
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems