On Sat, Aug 16, 2008 at 08:05:14AM -0400, Jeff Squyres wrote:
> On Aug 13, 2008, at 7:06 PM, Yvan Fournier wrote:
>> I seem to have encountered a bug in MPI-IO, in which
>> MPI_File_get_position_shared hangs when called by multiple processes
>> a communicator. It can be illustrated by the following simple test
>> in which a file is simply created with C IO, and opened with MPI-IO.
>> (defining or undefining MY_MPI_IO_BUG on line 5 enables/disables the
>> bug). From the MPI2 documentation, It seems that all processes should
>> able to call MPI_File_get_position_shared, but if more than one
>> uses it, it fails. Setting the shared pointer helps, but this should
>> be necessary, and the code still hangs (in more complete code, after
>> writing data).
>> I encounter the same problem with Open MPI 1.2.6 and MPICH2 1.0.7, so
>> I may have misread the documentation, but I suspect a ROMIO bug.
> Bummer. :-(
> It would be best to report this directly to the ROMIO maintainers via
> romio-maint_at_mcs.anl.gov. They lurk on this list, but they may not be
> paying attention to every mail.
Hi, that would be me, and yup, as you can see I don't check in too
Just to wrap this up, I'm glad you found workarounds. Shared file
pointers have a certain seductive quality about them, but they are a
pain in the neck to implement in the library.
You will almost assuredly scale to larger numbers of processors and
achieve higher I/O bandwidth if you do just a little bit of work.
Keep track of file offsets on your own and instead of doing
independent I/O use MPI_File_read_at_all or MPI_File_write_at_all.
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B