I had also posted the bug on the MPICH2 list, and received an
aswer from the ROMIO maintainers: the issue seems to be related to
NFS file locking bugs. I had been testing on an NFS system, and
when I re-tested under a local (ext3) file system, I did not reproduce
I had been experimenting with the MPI-IO using explicit offsets,
individual pointers, and shared pointers, and have workarounds,
so I'll just avoid shared pointers on NFS.
On Sat, 2008-08-16 at 08:19 -0400, users-request_at_[hidden] wrote:
> Date: Sat, 16 Aug 2008 08:05:14 -0400
> From: Jeff Squyres <jsquyres_at_[hidden]>
> Subject: Re: [OMPI users] bug in MPI_File_get_position_shared ?
> To: Open MPI Users <users_at_[hidden]>
> Cc: mpich2-maint_at_[hidden]
> Message-ID: <023F1DB0-8E8D-4C8C-8156-80AE52FF041F_at_[hidden]>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> On Aug 13, 2008, at 7:06 PM, Yvan Fournier wrote:
> > I seem to have encountered a bug in MPI-IO, in which
> > MPI_File_get_position_shared hangs when called by multiple processes
> > in
> > a communicator. It can be illustrated by the following simple test
> > case,
> > in which a file is simply created with C IO, and opened with MPI-IO.
> > (defining or undefining MY_MPI_IO_BUG on line 5 enables/disables the
> > bug). From the MPI2 documentation, It seems that all processes
> > should be
> > able to call MPI_File_get_position_shared, but if more than one
> > process
> > uses it, it fails. Setting the shared pointer helps, but this should
> > not
> > be necessary, and the code still hangs (in more complete code, after
> > writing data).
> > I encounter the same problem with Open MPI 1.2.6 and MPICH2 1.0.7, so
> > I may have misread the documentation, but I suspect a ROMIO bug.
> Bummer. :-(
> It would be best to report this directly to the ROMIO maintainers via romio-maint_at_[hidden]
> . They lurk on this list, but they may not be paying attention to
> every mail.
> If you wouldn't mind, please CC me on the mail to romio-maint. Thanks!
> Jeff Squyres
> Cisco Systems