Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] ROMIO bug reading darrays
From: George Bosilca (bosilca_at_[hidden])
Date: 2014-05-08 18:15:44


I read the MPICH trac ticket you pointed to and your analysis seems pertinent. The impact of my patch for “count = 0” has a similar outcome to yours: removed all references to the datatype if the count was zero, without looking fo the special markers.

Let me try to come up with a fix.

 Thanks,
   George.

On May 8, 2014, at 17:08 , Rob Latham <robl_at_[hidden]> wrote:

>
>
> On 05/07/2014 11:36 AM, Rob Latham wrote:
>>
>>
>> On 05/05/2014 09:20 PM, Richard Shaw wrote:
>>> Hello,
>>>
>>> I think I've come across a bug when using ROMIO to read in a 2D
>>> distributed array. I've attached a test case to this email.
>>
>> Thanks for the bug report and the test case.
>>
>> I've opened MPICH bug (because this is ROMIO's fault, not OpenMPI's
>> fault... until I can prove otherwise ! :>)
>
> This bug appears to be OpenMPIs fault now.
>
> I'm looking at OpenMPI's "pulled it from git an hour ago" version, and ROMIO's flattening code overruns arrays: the OpenMPI datatype processing routines return too few blocks for ranks 1 and 3.
>
> Michael Raymond told me off-list "I tracked this down to MPT not marking HVECTORs / STRUCTs with 0-sized counts as contiguous. Once I changed this, the memory corruption and the data mismatches both went away. ". Could something similar be happening in OpenMPI ?
>
> ==rob
>
>>
>> http://trac.mpich.org/projects/mpich/ticket/2089
>>
>> ==rob
>>
>>>
>>> In the testcase I first initialise an array of 25 doubles (which will be
>>> a 5x5 grid), then I create a datatype representing a 5x5 matrix
>>> distributed in 3x3 blocks over a 2x2 process grid. As a reference I use
>>> MPI_Pack to pull out the block cyclic array elements local to each
>>> process (which I think is correct). Then I write the original array of
>>> 25 doubles to disk, and use MPI-IO to read it back in (performing the
>>> Open, Set_view, and Real_all), and compare to the reference.
>>>
>>> Running this with OMPI, the two match on all ranks.
>>>
>>> > mpirun -mca io ompio -np 4 ./darr_read.x
>>> === Rank 0 === (9 elements)
>>> Packed: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>>> Read: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>>>
>>> === Rank 1 === (6 elements)
>>> Packed: 15.0 16.0 17.0 20.0 21.0 22.0
>>> Read: 15.0 16.0 17.0 20.0 21.0 22.0
>>>
>>> === Rank 2 === (6 elements)
>>> Packed: 3.0 4.0 8.0 9.0 13.0 14.0
>>> Read: 3.0 4.0 8.0 9.0 13.0 14.0
>>>
>>> === Rank 3 === (4 elements)
>>> Packed: 18.0 19.0 23.0 24.0
>>> Read: 18.0 19.0 23.0 24.0
>>>
>>>
>>>
>>> However, using ROMIO the two differ on two of the ranks:
>>>
>>> > mpirun -mca io romio -np 4 ./darr_read.x
>>> === Rank 0 === (9 elements)
>>> Packed: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>>> Read: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>>>
>>> === Rank 1 === (6 elements)
>>> Packed: 15.0 16.0 17.0 20.0 21.0 22.0
>>> Read: 0.0 1.0 2.0 0.0 1.0 2.0
>>>
>>> === Rank 2 === (6 elements)
>>> Packed: 3.0 4.0 8.0 9.0 13.0 14.0
>>> Read: 3.0 4.0 8.0 9.0 13.0 14.0
>>>
>>> === Rank 3 === (4 elements)
>>> Packed: 18.0 19.0 23.0 24.0
>>> Read: 0.0 1.0 0.0 1.0
>>>
>>>
>>>
>>> My interpretation is that the behaviour with OMPIO is correct.
>>> Interestingly everything matches up using both ROMIO and OMPIO if I set
>>> the block shape to 2x2.
>>>
>>> This was run on OS X using 1.8.2a1r31632. I have also run this on Linux
>>> with OpenMPI 1.7.4, and OMPIO is still correct, but using ROMIO I just
>>> get segfaults.
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users