Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Fwd: [OMPI users] Onesided + derived datatypes
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-12-13 16:11:06


No problem-o.

George -- can you please file a bug?

On Dec 13, 2008, at 3:11 PM, Brian Barrett wrote:

> Sorry, I really won't have time to look until after Christmas. I'll
> put it on the to-do list, but that's as soon as it has a prayer of
> reaching the top.
>
> Brian
>
> On Dec 13, 2008, at 1:02 PM, George Bosilca wrote:
>
>> Brian,
>>
>> I found a second problem with rebuilding the datatype on the
>> remote. Originally, the displacement were wrongly computed. This is
>> now fixed. However, the data at the end of the fence is still not
>> correct on the remote.
>>
>> I can confirm that the packed message contains only 0 instead of
>> the real value, but I couldn't figure out how these 0 got there.
>> The pack function works correctly for the MPI_Send function, I
>> don't see any reason not to do the same for the MPI_Put. As you're
>> the one-sided guy in ompi, can you take a look at the MPI_Put to
>> see why the data is incorrect?
>>
>> george.
>>
>> On Dec 11, 2008, at 19:14 , Brian Barrett wrote:
>>
>>> I think that's a reasonable solution. However, the words "not it"
>>> come to mind. Sorry, but I have way too much on my plate this
>>> month. By the way, in case no one noticed, I had e-mailed my
>>> findings to devel. Someone might want to reply to Dorian's e-mail
>>> on users.
>>>
>>>
>>> Brian
>>>
>>> On Dec 11, 2008, at 2:31 PM, George Bosilca wrote:
>>>
>>>> Brian,
>>>>
>>>> You're right, the datatype is being too cautious with the
>>>> boundaries when detecting the overlap. There is no good solution
>>>> to detect the overlap except parsing the whole memory layout to
>>>> check the status of every predefined type. As one can imagine
>>>> this is a very expensive operation. This is reason I preferred to
>>>> use the true extent and the size of the data to try to detect the
>>>> overlap. This approach is a lot faster, but has a poor accuracy.
>>>>
>>>> The best solution I can think of in short term is to remove
>>>> completely the overlap check. This will have absolutely no impact
>>>> on the way we pack the data, but can lead to unexpected results
>>>> when we unpack and the data overlap. But I guess this can be
>>>> considered as a user error, as the MPI standard clearly state
>>>> that the result of such an operation is ... unexpected.
>>>>
>>>> george.
>>>>
>>>> On Dec 10, 2008, at 22:20 , Brian Barrett wrote:
>>>>
>>>>> Hi all -
>>>>>
>>>>> I looked into this, and it appears to be datatype related. If
>>>>> the displacements are set t o 3, 2, 1, 0, there the datatype
>>>>> will fail the type checks for one-sided because is_overlapped()
>>>>> returns 1 for the datatype. My reading of the standard seems to
>>>>> indicate this should not be. I haven't looked into the problems
>>>>> with displacement set to 0, 1, 2, 3, but I'm guessing it has
>>>>> something to do with the reverse problem.
>>>>>
>>>>> This looks like a datatype issue, so it's out of my realm of
>>>>> expertise. Can someone else take a look?
>>>>>
>>>>> Brian
>>>>>
>>>>> Begin forwarded message:
>>>>>
>>>>>> From: doriankrause <doriankrause_at_[hidden]>
>>>>>> Date: December 10, 2008 4:07:55 PM MST
>>>>>> To: users_at_[hidden]
>>>>>> Subject: [OMPI users] Onesided + derived datatypes
>>>>>> Reply-To: Open MPI Users <users_at_[hidden]>
>>>>>>
>>>>>> Hi List,
>>>>>>
>>>>>> I have a MPI program which uses one sided communication with
>>>>>> derived
>>>>>> datatypes (MPI_Type_create_indexed_block). I developed the code
>>>>>> with
>>>>>> MPICH2 and unfortunately didn't thought about trying it out with
>>>>>> OpenMPI. Now that I'm "porting" the Application to OpenMPI I'm
>>>>>> facing
>>>>>> some problems. On the most machines I get an SIGSEGV in
>>>>>> MPI_Win_fence,
>>>>>> sometimes an invalid datatype shows up. I ran the program in
>>>>>> Valgrind
>>>>>> and didn't get anything valuable. Since I can't see a reason
>>>>>> for this
>>>>>> problem (at least if I understand the standard correctly), I
>>>>>> wrote the
>>>>>> attached testprogram.
>>>>>>
>>>>>> Here are my experiences:
>>>>>>
>>>>>> * If I compile without ONESIDED defined, everything works and
>>>>>> V1 and V2
>>>>>> give the same results
>>>>>> * If I compile with ONESIDED and V2 defined
>>>>>> (MPI_Type_contiguous) it works.
>>>>>> * ONESIDED + V1 + O2: No errors but obviously nothing is send?
>>>>>> (Am I in
>>>>>> assuming that V1+O2 and V2 should be equivalent?)
>>>>>> * ONESIDED + V1 + O1:
>>>>>> [m02:03115] *** An error occurred in MPI_Put
>>>>>> [m02:03115] *** on win
>>>>>> [m02:03115] *** MPI_ERR_TYPE: invalid datatype
>>>>>> [m02:03115] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>>>>
>>>>>> I didn't get a segfault as in the "real life example" but if
>>>>>> ompitest.cc
>>>>>> is correct it means that OpenMPI is buggy when it comes to
>>>>>> onesided
>>>>>> communication and (some) derived datatypes, so that it is
>>>>>> probably not
>>>>>> of problem in my code.
>>>>>>
>>>>>> I'm using OpenMPI-1.2.8 with the newest gcc 4.3.2 but the same
>>>>>> behaviour
>>>>>> can be be seen with gcc-3.3.1 and intel 10.1.
>>>>>>
>>>>>> Please correct me if ompitest.cc contains errors. Otherwise I
>>>>>> would be
>>>>>> glad to hear how I should report these problems to the
>>>>>> develepors (if
>>>>>> they don't read this).
>>>>>>
>>>>>> Thanks + best regards
>>>>>>
>>>>>> Dorian
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> <ompitest.tar.gz>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems