Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Incorrect results with MPI-IO under OpenMPI v1.3.1
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-04-15 07:06:04


Can either of you provide a small example that shows the problem,
perchance?

On Apr 6, 2009, at 4:41 PM, Yvan Fournier wrote:

> Hello to all,
>
> I have also encountered a similar bug with MPI-IO
> with Open MPI 1.3.1, reading a Code_Saturne preprocessed mesh file
> (www.code-saturne.org). Reading the file can be done using 2 MPI-IO
> modes, or one non-MPI-IO mode.
>
> The first MPI-IO mode uses individual file pointers, and involves a
> series of MPI_File_Read_all with all ranks using the same view (for
> record headers), interlaced with MPI_File_Read_all with ranks using
> different views (for record data, successive blocks being read by each
> rank).
>
> The second MPI-IO mode uses explicit file offsets, with
> MPI_File_read_at_all instead of MPI_File_read_all.
>
> Both MPI-IO modes seem to work fine with OpenMPI 1.2, MPICH 2,
> and variants on IBM Blue Gene/L and P, as well as Bull Novascale,
> but with OpenMPI 1.3.1, data read seems to be corrupt on at least
> one file using the individual file pointers approach (though it
> works well using explicit offsets).
>
> The bug does not appear in unit tests, and it only appears after
> several
> records are read on the case that does fail (on 2 ranks), so to
> reproduce it with a simple program, I would have to extract the exact
> file access patterns from the exact case which fails, which would
> require a few extra hours of work.
>
> If the bug is not reproduced in a simpler manner first, I will try
> to build a simple program reproducing the bug within a week or 2,
> but In the meantime, I just want to confirm Scott's observation
> (hoping it is the same bug).
>
> Best regards,
>
> Yvan Fournier
>
> On Mon, 2009-04-06 at 16:03 -0400, users-request_at_[hidden] wrote:
>
> > Date: Mon, 06 Apr 2009 12:16:18 -0600
> > From: Scott Collis <sscollis_at_[hidden]>
> > Subject: [OMPI users] Incorrect results with MPI-IO under OpenMPI
> > v1.3.1
> > To: users_at_[hidden]
> > Message-ID: <B20E6603-EB8C-408F-83EF-B018D8A73660_at_[hidden]>
> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> >
> > I have been a user of MPI-IO for 4+ years and have a code that has
> run
> > correctly with MPICH, MPICH2, and OpenMPI 1.2.*
> >
> > I recently upgraded to OpenMPI 1.3.1 and immediately noticed that my
> > MPI-IO generated output files are corrupted. I have not yet had a
> > chance to debug this in detail, but it appears that
> > MPI_File_write_all() commands are not placing information
> correctly on
> > their file_view when running with more than 1 processor
> (everything is
> > okay with -np 1).
> >
> > Note that I have observed the same incorrect behavior on both Linux
> > and OS-X. I have also gone back and made sure that the same code
> > works with MPICH, MPICH2, and OpenMPI 1.2.* so I'm fairly confident
> > that something has been changed or broken as of OpenMPI 1.3.*. Just
> > today, I checked out the SVN repository version of OpenMPI and built
> > and tested my code with that and the results are incorrect just as
> for
> > the 1.3.1 tarball.
> >
> > While I plan to continue to debug this and will try to put
> together a
> > small test that demonstrates the issue, I thought that I would first
> > send out this message to see if this might trigger a thought within
> > the OpenMPI development team as to where this issue might be.
> >
> > Please let me know if you have any ideas as I would very much
> > appreciate it!
> >
> > Thanks in advance,
> >
> > Scott
> > --
> > Scott Collis
> > sscollis_at_[hidden]
> >
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems