I originally thought that it was an issue related to 32-bit
executables, but it seems to affect 64-bit as well...
I found references to this problem -- it was reported back in 2007:
If you look at the code, you will find that MPI_File_read() calls the
special I/O driver implementation if one's available, but if not then
there's also the generic ad_ufs device (POSIX) implementation.
IIRC, SciNet is using IBM GPFS (BTW, a few years ago when Chris gave
me a tour of the machine room at MP, the cluster he was managing was
using Lustre). Since there is no specific implementation for GPFS,
then ROMIO would default back to ad_ufs, and calls
In ADIOI_GEN_ReadContig(), we have code:
len = (ADIO_Offset)datatype_size * (ADIO_Offset)count;
And ADIO_Offset is typdef'ed to MPI_Offset, which is 64-bit on 64-bit.
So far so good.
However, the way len is used... hmm, can be an issue:
ADIOI_Assert(len == (unsigned int) len); /* read takes an unsigned
int parm */
err = read(fd->fd_sys, buf, (unsigned int)len);
So wait... read takes an unsigned int?? From the manpage:
ssize_t read(int fd, void *buf, size_t count);
size_t is not unsigned int... it could be if it is 32-bit, but not
when we are LP64.
Other places in ompi/mca/io/romio/romio/mpi-io/read.c also need to be
updated (those are really easy as they are sanity checks). But at
least someone can try to fix the root cause by changing 2 lines of
code mentioned above, or the ROMIO guys can comment on why an unsigned
int should be passed to read(2)... (Internally, the file offset
(fp_sys_posn) is of type ADIO_Offset, so it should be fine.)
However, I've only spent less than 2 hours on this as I found it
interesting -- 12 years ago I was fixing 32-bit file offset issues in
a supercomputer middleware company, and there are still issues with
32-bit vs 64-bit file pointers today! :-O So I guess 30 years from now
when we run out of space of 64-bit, we will be fixing 32-bit, 64-bit
offset issues for 128-bit applications (that's when we have quantum
computers!)! :-D . Also take the suggestions above at your own risk!
(And I still need to read the "An Abstract-Device Interface for
Implementing Portable Parallel-I/O Interfaces" to understand more
about the internal structures of ROMIO!)
Open Grid Scheduler - The Official Open Source Grid Engine
On Tue, Aug 7, 2012 at 6:02 PM, Richard Shaw <jrs65_at_[hidden]> wrote:
> On Tuesday, 7 August, 2012 at 12:21 PM, Rob Latham wrote:
>> Hi. Known problem in the ROMIO MPI-IO implementation (which OpenMPI
>> uses). Been on my list of "things to fix" for a while.
> Ok, thanks. I'm glad it's not just us.
> Is there a timescale for this being fixed? Because if it's a long term thing, I would suggest it might be worth putting a FAQ entry on it or something similar? Especially as it's quite contradictory to most peoples interpretation of the specification. Maybe it's already listed as a known problem somewhere, and I just missed it - it took quite a while before I stopped thinking it was an issue with my code.
> Is there a better workaround than just splitting the MPI_File_read up into multiple reads of <2^31 bytes? We're actually trying to read in a distributed array, and the workaround awkwardly requires the creation and reading of multiple darray types, each designed to read in the correct number of blocks less than 2^31 bytes. This seems like it could be a bit fragile.
> Thanks again,
> users mailing list