This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
On Thu, Aug 18, 2011 at 08:46:46AM -0700, Tom Rosmond wrote:
> We have a large fortran application designed to run doing IO with either
> mpi_io or fortran direct access. On a linux workstation (16 AMD cores)
> running openmpi 1.5.3 and Intel fortran 12.0 we are having trouble with
> random failures with the mpi_io option which do not occur with
> conventional fortran direct access. We are using ext3 file systems, and
> I have seen some references hinting of similar problems with the
> ext3/mpiio combination. The application with the mpi_io option runs
> flawlessly on Cray architectures with Lustre file systems, so we are
> also suspicious of the ext3/mpiio combination. Does anyone else have
> experience with this combination that could shed some light on the
> problem, and hopefully some suggested solutions?
I'm glad to hear you're having success with mpi-io on Cray/Lustre.
That platform was a bit touchy for a while, but has gotten better over
the last two years.
My first guess would be that your linux workstation does not implement
a "strict enough" file system lock. ROMIO relies on the "fcntl" locks
to provide exclusive access to files at some points in the code.
Does your application use collective I/O ? It sounds like if you can
swap fortran and mpi-io so easily that maybe you do not. If there's
a way to make collective MPI-IO calls, that will eliminate many of the
fcntl lock calls.
Do you use MPI datatypes to describe either a file view or the
application data? These noncontiguous in memory and/or noncontiguous
in file access patterns will also trigger fcntl lock calls. You can
use an MPI-IO hint to disable data sieving, at a potentially
disastrous performance cost.
Mathematics and Computer Science Division
Argonne National Lab, IL USA