Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Fwd: [OMPI users] Onesided + derived datatypes
From: Brian Barrett (brbarret_at_[hidden])
Date: 2008-12-13 15:11:08


Sorry, I really won't have time to look until after Christmas. I'll
put it on the to-do list, but that's as soon as it has a prayer of
reaching the top.

Brian

On Dec 13, 2008, at 1:02 PM, George Bosilca wrote:

> Brian,
>
> I found a second problem with rebuilding the datatype on the remote.
> Originally, the displacement were wrongly computed. This is now
> fixed. However, the data at the end of the fence is still not
> correct on the remote.
>
> I can confirm that the packed message contains only 0 instead of the
> real value, but I couldn't figure out how these 0 got there. The
> pack function works correctly for the MPI_Send function, I don't see
> any reason not to do the same for the MPI_Put. As you're the one-
> sided guy in ompi, can you take a look at the MPI_Put to see why the
> data is incorrect?
>
> george.
>
> On Dec 11, 2008, at 19:14 , Brian Barrett wrote:
>
>> I think that's a reasonable solution. However, the words "not it"
>> come to mind. Sorry, but I have way too much on my plate this
>> month. By the way, in case no one noticed, I had e-mailed my
>> findings to devel. Someone might want to reply to Dorian's e-mail
>> on users.
>>
>>
>> Brian
>>
>> On Dec 11, 2008, at 2:31 PM, George Bosilca wrote:
>>
>>> Brian,
>>>
>>> You're right, the datatype is being too cautious with the
>>> boundaries when detecting the overlap. There is no good solution
>>> to detect the overlap except parsing the whole memory layout to
>>> check the status of every predefined type. As one can imagine this
>>> is a very expensive operation. This is reason I preferred to use
>>> the true extent and the size of the data to try to detect the
>>> overlap. This approach is a lot faster, but has a poor accuracy.
>>>
>>> The best solution I can think of in short term is to remove
>>> completely the overlap check. This will have absolutely no impact
>>> on the way we pack the data, but can lead to unexpected results
>>> when we unpack and the data overlap. But I guess this can be
>>> considered as a user error, as the MPI standard clearly state that
>>> the result of such an operation is ... unexpected.
>>>
>>> george.
>>>
>>> On Dec 10, 2008, at 22:20 , Brian Barrett wrote:
>>>
>>>> Hi all -
>>>>
>>>> I looked into this, and it appears to be datatype related. If
>>>> the displacements are set t o 3, 2, 1, 0, there the datatype will
>>>> fail the type checks for one-sided because is_overlapped()
>>>> returns 1 for the datatype. My reading of the standard seems to
>>>> indicate this should not be. I haven't looked into the problems
>>>> with displacement set to 0, 1, 2, 3, but I'm guessing it has
>>>> something to do with the reverse problem.
>>>>
>>>> This looks like a datatype issue, so it's out of my realm of
>>>> expertise. Can someone else take a look?
>>>>
>>>> Brian
>>>>
>>>> Begin forwarded message:
>>>>
>>>>> From: doriankrause <doriankrause_at_[hidden]>
>>>>> Date: December 10, 2008 4:07:55 PM MST
>>>>> To: users_at_[hidden]
>>>>> Subject: [OMPI users] Onesided + derived datatypes
>>>>> Reply-To: Open MPI Users <users_at_[hidden]>
>>>>>
>>>>> Hi List,
>>>>>
>>>>> I have a MPI program which uses one sided communication with
>>>>> derived
>>>>> datatypes (MPI_Type_create_indexed_block). I developed the code
>>>>> with
>>>>> MPICH2 and unfortunately didn't thought about trying it out with
>>>>> OpenMPI. Now that I'm "porting" the Application to OpenMPI I'm
>>>>> facing
>>>>> some problems. On the most machines I get an SIGSEGV in
>>>>> MPI_Win_fence,
>>>>> sometimes an invalid datatype shows up. I ran the program in
>>>>> Valgrind
>>>>> and didn't get anything valuable. Since I can't see a reason for
>>>>> this
>>>>> problem (at least if I understand the standard correctly), I
>>>>> wrote the
>>>>> attached testprogram.
>>>>>
>>>>> Here are my experiences:
>>>>>
>>>>> * If I compile without ONESIDED defined, everything works and V1
>>>>> and V2
>>>>> give the same results
>>>>> * If I compile with ONESIDED and V2 defined
>>>>> (MPI_Type_contiguous) it works.
>>>>> * ONESIDED + V1 + O2: No errors but obviously nothing is send?
>>>>> (Am I in
>>>>> assuming that V1+O2 and V2 should be equivalent?)
>>>>> * ONESIDED + V1 + O1:
>>>>> [m02:03115] *** An error occurred in MPI_Put
>>>>> [m02:03115] *** on win
>>>>> [m02:03115] *** MPI_ERR_TYPE: invalid datatype
>>>>> [m02:03115] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>>>
>>>>> I didn't get a segfault as in the "real life example" but if
>>>>> ompitest.cc
>>>>> is correct it means that OpenMPI is buggy when it comes to
>>>>> onesided
>>>>> communication and (some) derived datatypes, so that it is
>>>>> probably not
>>>>> of problem in my code.
>>>>>
>>>>> I'm using OpenMPI-1.2.8 with the newest gcc 4.3.2 but the same
>>>>> behaviour
>>>>> can be be seen with gcc-3.3.1 and intel 10.1.
>>>>>
>>>>> Please correct me if ompitest.cc contains errors. Otherwise I
>>>>> would be
>>>>> glad to hear how I should report these problems to the
>>>>> develepors (if
>>>>> they don't read this).
>>>>>
>>>>> Thanks + best regards
>>>>>
>>>>> Dorian
>>>>>
>>>>>
>>>>>
>>>>>
>>>> <ompitest.tar.gz>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>