Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ROMIO bug reading darrays
From: Rob Latham (robl_at_[hidden])
Date: 2014-05-07 17:48:45


On 05/07/2014 03:10 PM, Richard Shaw wrote:
> Thanks Rob. I'll keep track of it over there. How often do updated
> versions of ROMIO get pulled over from MPICH into OpenMPI?
>
> On a slightly related note, I think I heard that you had fixed the 32bit
> issues in ROMIO that were causing it to break when reading more than 2
> GB (i.e.
> http://www.open-mpi.org/community/lists/users/2012/07/19762.php). Have
> those been pulled into OpenMPI? I've been staying clear of ROMIO for a
> while (in favour of OMPIO), to avoid those issues.

Looks like I fixed that late last year. A slew of ">31 bit transfers"
fixes went into the MPICH-3.1 release. Slurping those changes, which
are individually small (using some _x versions of type-inquiry routines
here, some MPI_Count promotions there) but pervasive, might give OpenMPI
a bit of a headache.

==rob

>
> Thanks,
> Richard
>
>
> On 7 May 2014 12:36, Rob Latham <robl_at_[hidden]
> <mailto:robl_at_[hidden]>> wrote:
>
>
>
> On 05/05/2014 09:20 PM, Richard Shaw wrote:
>
> Hello,
>
> I think I've come across a bug when using ROMIO to read in a 2D
> distributed array. I've attached a test case to this email.
>
>
> Thanks for the bug report and the test case.
>
> I've opened MPICH bug (because this is ROMIO's fault, not OpenMPI's
> fault... until I can prove otherwise ! :>)
>
> http://trac.mpich.org/__projects/mpich/ticket/2089
> <http://trac.mpich.org/projects/mpich/ticket/2089>
>
> ==rob
>
>
> In the testcase I first initialise an array of 25 doubles (which
> will be
> a 5x5 grid), then I create a datatype representing a 5x5 matrix
> distributed in 3x3 blocks over a 2x2 process grid. As a
> reference I use
> MPI_Pack to pull out the block cyclic array elements local to each
> process (which I think is correct). Then I write the original
> array of
> 25 doubles to disk, and use MPI-IO to read it back in
> (performing the
> Open, Set_view, and Real_all), and compare to the reference.
>
> Running this with OMPI, the two match on all ranks.
>
> > mpirun -mca io ompio -np 4 ./darr_read.x
> === Rank 0 === (9 elements)
> Packed: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
> Read: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>
> === Rank 1 === (6 elements)
> Packed: 15.0 16.0 17.0 20.0 21.0 22.0
> Read: 15.0 16.0 17.0 20.0 21.0 22.0
>
> === Rank 2 === (6 elements)
> Packed: 3.0 4.0 8.0 9.0 13.0 14.0
> Read: 3.0 4.0 8.0 9.0 13.0 14.0
>
> === Rank 3 === (4 elements)
> Packed: 18.0 19.0 23.0 24.0
> Read: 18.0 19.0 23.0 24.0
>
>
>
> However, using ROMIO the two differ on two of the ranks:
>
> > mpirun -mca io romio -np 4 ./darr_read.x
> === Rank 0 === (9 elements)
> Packed: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
> Read: 0.0 1.0 2.0 5.0 6.0 7.0 10.0 11.0 12.0
>
> === Rank 1 === (6 elements)
> Packed: 15.0 16.0 17.0 20.0 21.0 22.0
> Read: 0.0 1.0 2.0 0.0 1.0 2.0
>
> === Rank 2 === (6 elements)
> Packed: 3.0 4.0 8.0 9.0 13.0 14.0
> Read: 3.0 4.0 8.0 9.0 13.0 14.0
>
> === Rank 3 === (4 elements)
> Packed: 18.0 19.0 23.0 24.0
> Read: 0.0 1.0 0.0 1.0
>
>
>
> My interpretation is that the behaviour with OMPIO is correct.
> Interestingly everything matches up using both ROMIO and OMPIO
> if I set
> the block shape to 2x2.
>
> This was run on OS X using 1.8.2a1r31632. I have also run this
> on Linux
> with OpenMPI 1.7.4, and OMPIO is still correct, but using ROMIO
> I just
> get segfaults.
>
> Thanks,
> Richard
>
>
> _________________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/__mailman/listinfo.cgi/users
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> _________________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/__mailman/listinfo.cgi/users
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA