Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] bug in MPI_File_get_position_shared ?
From: Robert Latham (robl_at_[hidden])
Date: 2008-09-15 17:03:39

On Sat, Aug 16, 2008 at 08:05:14AM -0400, Jeff Squyres wrote:
> On Aug 13, 2008, at 7:06 PM, Yvan Fournier wrote:
>> I seem to have encountered a bug in MPI-IO, in which
>> MPI_File_get_position_shared hangs when called by multiple processes
>> in
>> a communicator. It can be illustrated by the following simple test
>> case,
>> in which a file is simply created with C IO, and opened with MPI-IO.
>> (defining or undefining MY_MPI_IO_BUG on line 5 enables/disables the
>> bug). From the MPI2 documentation, It seems that all processes should
>> be
>> able to call MPI_File_get_position_shared, but if more than one
>> process
>> uses it, it fails. Setting the shared pointer helps, but this should
>> not
>> be necessary, and the code still hangs (in more complete code, after
>> writing data).
>> I encounter the same problem with Open MPI 1.2.6 and MPICH2 1.0.7, so
>> I may have misread the documentation, but I suspect a ROMIO bug.
> Bummer. :-(
> It would be best to report this directly to the ROMIO maintainers via
> They lurk on this list, but they may not be
> paying attention to every mail.

Hi, that would be me, and yup, as you can see I don't check in too

Just to wrap this up, I'm glad you found workarounds. Shared file
pointers have a certain seductive quality about them, but they are a
pain in the neck to implement in the library.

You will almost assuredly scale to larger numbers of processors and
achieve higher I/O bandwidth if you do just a little bit of work.
Keep track of file offsets on your own and instead of doing
independent I/O use MPI_File_read_at_all or MPI_File_write_at_all.


Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B