On Mon, 2011-08-29 at 14:22 -0500, Rob Latham wrote:
> On Mon, Aug 22, 2011 at 08:38:52AM -0700, Tom Rosmond wrote:
> > Yes, we are using collective I/O (mpi_file_write_at_all,
> > mpi_file_read_at_all). The swaping of fortran and mpi-io are just
> > branches in the code at strategic locations. Although the mpi-io files
> > are readable with fortran direct access, we don't do so from within the
> > application because of different data organization in the files.
> > > Do you use MPI datatypes to describe either a file view or the
> > > application data? These noncontiguous in memory and/or noncontiguous
> > > in file access patterns will also trigger fcntl lock calls. You can
> > > use an MPI-IO hint to disable data sieving, at a potentially
> > > disastrous performance cost.
> > Yes, we use an 'mpi_type_indexed' datatype to describe the data
> > organization.
> > Any thoughts about the XFS vs EXT3 question?
> We have machines at the lab with XFS and machines with EXT3: I can't
> say I have ever seen an MPI-IO problem we could trace to the specific
> file system. The MPI-IO library just makes a bunch of posix I/O
> calls under the hood: if write(2), open(2), and friends are broken for
> XFS or EXT3, those kinds of bugs get a lot of attention :>
> At this point the usual course of action is "post a small reproducing
> test case". Your first message said this was a big code, so perhaps
> that will not be so easy...
True, and because it is an intermittent problem it would probably be
extremely difficult or impossible to reproduce in another code or in
another hardware/software environment. Because we have an acceptable
workaround, it just isn't worth the effort. Thanks for the help.