Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Onesided + derived datatypes
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-12-16 04:25:27


This issue should be fixed starting from revision r20134. The patch
waits to be tested on SPARC64 before being pushed in the next 1.2 and
1.3 releases.

   Thanks for the test application,
     george.

On Dec 13, 2008, at 11:58 , George Bosilca wrote:

> No. It fixes an issue when correctly rebuilding (i.e. with the real
> displacements) the data-type on the remote side, but it didn't fix
> the wrong values problem.
>
> george.
>
> On Dec 13, 2008, at 07:59 , Jeff Squyres wrote:
>
>> George -- you had a commit after this (r20123) -- did that fix the
>> problem?
>>
>>
>> On Dec 12, 2008, at 8:14 PM, George Bosilca wrote:
>>
>>> Dorian,
>>>
>>> I looked into this again. So far I can confirm that the datatype
>>> is correctly created, and always contain the correct values
>>> (internally). If instead of one sided you use send/recv then the
>>> output is exactly what you expect. With the one sided there are
>>> several strange things. What I can say so far is that everything
>>> works fine, except when the block indexed datatype is used as the
>>> remote datatype in the MPI_Put operation. In this case the remote
>>> memory is not modified.
>>>
>>> george.
>>>
>>>
>>> On Dec 12, 2008, at 08:20 , Dorian Krause wrote:
>>>
>>>> Hi again.
>>>>
>>>> I adapted my testing program by overwriting the window buffer
>>>> complete with 1. This allows me to see at which places OpenMPI
>>>> writes.
>>>> The result is:
>>>>
>>>> *** -DO1=1 -DV1=1 *** (displ 3,2,1,0 ,
>>>> MPI_Type_create_indexed_block)
>>>> mem[0] = { 0.0000, 0.0000, 0.0000}
>>>> mem[1] = { 0.0000, 0.0000, 0.0000}
>>>> mem[2] = { 0.0000, 0.0000, 0.0000}
>>>> mem[3] = { nan, nan, nan}
>>>> mem[4] = { nan, nan, nan}
>>>> mem[5] = { nan, nan, nan}
>>>> mem[6] = { nan, nan, nan}
>>>> mem[7] = { nan, nan, nan}
>>>> mem[8] = { nan, nan, nan}
>>>> mem[9] = { nan, nan, nan}
>>>> *** -DO1=1 -DV2=1 *** MPI_Type_contiguous(4, mpi_double3, &mpit)
>>>> mem[0] = { 0.0000, 1.0000, 2.0000}
>>>> mem[1] = { 3.0000, 4.0000, 5.0000}
>>>> mem[2] = { 6.0000, 7.0000, 8.0000}
>>>> mem[3] = { 9.0000, 10.0000, 11.0000}
>>>> mem[4] = { nan, nan, nan}
>>>> mem[5] = { nan, nan, nan}
>>>> mem[6] = { nan, nan, nan}
>>>> mem[7] = { nan, nan, nan}
>>>> mem[8] = { nan, nan, nan}
>>>> mem[9] = { nan, nan, nan}
>>>> *** -DO2=1 -DV1=1 *** (displ 0,1,2,3 ,
>>>> MPI_Type_create_indexed_block)
>>>> mem[0] = { 0.0000, 0.0000, 0.0000}
>>>> mem[1] = { 0.0000, 0.0000, 0.0000}
>>>> mem[2] = { 0.0000, 0.0000, 0.0000}
>>>> mem[3] = { 0.0000, 0.0000, 0.0000}
>>>> mem[4] = { nan, nan, nan}
>>>> mem[5] = { nan, nan, nan}
>>>> mem[6] = { nan, nan, nan}
>>>> mem[7] = { nan, nan, nan}
>>>> mem[8] = { nan, nan, nan}
>>>> mem[9] = { nan, nan, nan}
>>>> *** -DO2=1 -DV2=1 *** MPI_Type_contiguous(4, mpi_double3, &mpit)
>>>> mem[0] = { 0.0000, 1.0000, 2.0000}
>>>> mem[1] = { 3.0000, 4.0000, 5.0000}
>>>> mem[2] = { 6.0000, 7.0000, 8.0000}
>>>> mem[3] = { 9.0000, 10.0000, 11.0000}
>>>> mem[4] = { nan, nan, nan}
>>>> mem[5] = { nan, nan, nan}
>>>> mem[6] = { nan, nan, nan}
>>>> mem[7] = { nan, nan, nan}
>>>> mem[8] = { nan, nan, nan}
>>>> mem[9] = { nan, nan, nan}
>>>>
>>>> Note that for the reversed ordering (3,2,1,0) only 3 lines are
>>>> written. If I use displacements 3,2,1,8
>>>> I get
>>>>
>>>> *** -DO1=1 -DV1=1 ***
>>>> mem[0] = { 0.0000, 0.0000, 0.0000}
>>>> mem[1] = { 0.0000, 0.0000, 0.0000}
>>>> mem[2] = { 0.0000, 0.0000, 0.0000}
>>>> mem[3] = { nan, nan, nan}
>>>> mem[4] = { nan, nan, nan}
>>>> mem[5] = { nan, nan, nan}
>>>> mem[6] = { nan, nan, nan}
>>>> mem[7] = { nan, nan, nan}
>>>> mem[8] = { 0.0000, 0.0000, 0.0000}
>>>> mem[9] = { nan, nan, nan}
>>>>
>>>> but 3,2,8,1 yields
>>>>
>>>> *** -DO1=1 -DV1=1 ***
>>>> mem[0] = { 0.0000, 0.0000, 0.0000}
>>>> mem[1] = { 0.0000, 0.0000, 0.0000}
>>>> mem[2] = { 0.0000, 0.0000, 0.0000}
>>>> mem[3] = { nan, nan, nan}
>>>> mem[4] = { nan, nan, nan}
>>>> mem[5] = { nan, nan, nan}
>>>> mem[6] = { nan, nan, nan}
>>>> mem[7] = { nan, nan, nan}
>>>> mem[8] = { nan, nan, nan}
>>>> mem[9] = { nan, nan, nan}
>>>>
>>>> Dorian
>>>>
>>>>
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: "Dorian Krause" <doriankrause_at_[hidden]>
>>>>> Gesendet: 12.12.08 13:49:25
>>>>> An: Open MPI Users <users_at_[hidden]>
>>>>> Betreff: Re: [OMPI users] Onesided + derived datatypes
>>>>
>>>>
>>>>> Thanks George (and Brian :)).
>>>>>
>>>>> The MPI_Put error is gone. Did you take a look at the problem
>>>>> that with the block_indexed type the PUT doesn't work? I'm
>>>>> still getting the following output (V1 corresponds to the datatype
>>>>> created with MPI_Type_create_indexed_block while the V2 type
>>>>> is created with MPI_Type_contiguous, the ordering doesn't care
>>>>> anymore after
>>>>> your fix) which confuses me
>>>>> because I remember that (on one machine) MPI_Put with
>>>>> MPI_Type_create_indexed
>>>>> worked until the invalid datatype error showed up (after a
>>>>> couple of timesteps).
>>>>>
>>>>> *** -DO1=1 -DV1=1 ***
>>>>> mem[0] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[1] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[2] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[3] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[4] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[5] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[6] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[7] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[8] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[9] = { 0.0000, 0.0000, 0.0000}
>>>>> *** -DO1=1 -DV2=1 ***
>>>>> mem[0] = { 5.0000, 0.0000, 0.0000}
>>>>> mem[1] = { 0.0000, 0.0000, -1.0000}
>>>>> mem[2] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[3] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[4] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[5] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[6] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[7] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[8] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[9] = { 0.0000, 0.0000, 0.0000}
>>>>> *** -DO2=1 -DV1=1 ***
>>>>> mem[0] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[1] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[2] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[3] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[4] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[5] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[6] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[7] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[8] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[9] = { 0.0000, 0.0000, 0.0000}
>>>>> *** -DO2=1 -DV2=1 ***
>>>>> mem[0] = { 5.0000, 0.0000, 0.0000}
>>>>> mem[1] = { 0.0000, 0.0000, -1.0000}
>>>>> mem[2] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[3] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[4] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[5] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[6] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[7] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[8] = { 0.0000, 0.0000, 0.0000}
>>>>> mem[9] = { 0.0000, 0.0000, 0.0000}
>>>>>
>>>>>
>>>>> Thanks for your help.
>>>>>
>>>>> Dorian
>>>>>
>>>>>
>>>>>> -----Ursprüngliche Nachricht-----
>>>>>> Von: "George Bosilca" <bosilca_at_[hidden]>
>>>>>> Gesendet: 12.12.08 01:35:57
>>>>>> An: Open MPI Users <users_at_[hidden]>
>>>>>> Betreff: Re: [OMPI users] Onesided + derived datatypes
>>>>>
>>>>>
>>>>>> Dorian,
>>>>>>
>>>>>> You are right, the datatype generated using the block_index
>>>>>> function
>>>>>> is a legal data-type. We wrongly determined some overlapping
>>>>>> regions
>>>>>> in the description [which is illegal based on the MPI
>>>>>> standard]. The
>>>>>> detection of such overlapping regions being a very expensive
>>>>>> process
>>>>>> if we don't want any false positives (such as your datatype), I
>>>>>> prefer
>>>>>> to remove it completely.
>>>>>>
>>>>>> To keep it short I just committed a patch (r20120) in the
>>>>>> trunk, and
>>>>>> I'll take care to move it in the 1.3 and the 1.2.9.
>>>>>>
>>>>>> Thanks for your help,
>>>>>> george.
>>>>>>
>>>>>> On Dec 10, 2008, at 18:07 , doriankrause wrote:
>>>>>>
>>>>>>> Hi List,
>>>>>>>
>>>>>>> I have a MPI program which uses one sided communication with
>>>>>>> derived
>>>>>>> datatypes (MPI_Type_create_indexed_block). I developed the
>>>>>>> code with
>>>>>>> MPICH2 and unfortunately didn't thought about trying it out with
>>>>>>> OpenMPI. Now that I'm "porting" the Application to OpenMPI I'm
>>>>>>> facing
>>>>>>> some problems. On the most machines I get an SIGSEGV in
>>>>>>> MPI_Win_fence,
>>>>>>> sometimes an invalid datatype shows up. I ran the program in
>>>>>>> Valgrind
>>>>>>> and didn't get anything valuable. Since I can't see a reason
>>>>>>> for this
>>>>>>> problem (at least if I understand the standard correctly), I
>>>>>>> wrote the
>>>>>>> attached testprogram.
>>>>>>>
>>>>>>> Here are my experiences:
>>>>>>>
>>>>>>> * If I compile without ONESIDED defined, everything works and
>>>>>>> V1 and
>>>>>>> V2
>>>>>>> give the same results
>>>>>>> * If I compile with ONESIDED and V2 defined
>>>>>>> (MPI_Type_contiguous) it
>>>>>>> works.
>>>>>>> * ONESIDED + V1 + O2: No errors but obviously nothing is send?
>>>>>>> (Am I
>>>>>>> in
>>>>>>> assuming that V1+O2 and V2 should be equivalent?)
>>>>>>> * ONESIDED + V1 + O1:
>>>>>>> [m02:03115] *** An error occurred in MPI_Put
>>>>>>> [m02:03115] *** on win
>>>>>>> [m02:03115] *** MPI_ERR_TYPE: invalid datatype
>>>>>>> [m02:03115] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>>>>>>
>>>>>>> I didn't get a segfault as in the "real life example" but if
>>>>>>> ompitest.cc
>>>>>>> is correct it means that OpenMPI is buggy when it comes to
>>>>>>> onesided
>>>>>>> communication and (some) derived datatypes, so that it is
>>>>>>> probably not
>>>>>>> of problem in my code.
>>>>>>>
>>>>>>> I'm using OpenMPI-1.2.8 with the newest gcc 4.3.2 but the same
>>>>>>> behaviour
>>>>>>> can be be seen with gcc-3.3.1 and intel 10.1.
>>>>>>>
>>>>>>> Please correct me if ompitest.cc contains errors. Otherwise I
>>>>>>> would be
>>>>>>> glad to hear how I should report these problems to the
>>>>>>> develepors (if
>>>>>>> they don't read this).
>>>>>>>
>>>>>>> Thanks + best regards
>>>>>>>
>>>>>>> Dorian
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <ompitest.tar.gz>_______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>
>>>>>
>>>>> ____________________________________________________________________
>>>>> Psssst! Schon vom neuen WEB.DE MultiMessenger gehört?
>>>>> Der kann`s mit allen: http://www.produkte.web.de/messenger/?did=3123
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>> ____________________________________________________________________
>>>> Psssst! Schon vom neuen WEB.DE MultiMessenger gehört?
>>>> Der kann`s mit allen: http://www.produkte.web.de/messenger/?
>>>> did=3123
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users