Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Can't read more than 2^31 bytes with MPI_File_read, regardless of type?
From: Rayson Ho (raysonlogin_at_[hidden])
Date: 2012-08-07 22:11:14


I originally thought that it was an issue related to 32-bit
executables, but it seems to affect 64-bit as well...

I found references to this problem -- it was reported back in 2007:

http://lists.mcs.anl.gov/pipermail/mpich-discuss/2007-July/002600.html

If you look at the code, you will find that MPI_File_read() calls the
special I/O driver implementation if one's available, but if not then
there's also the generic ad_ufs device (POSIX) implementation.

IIRC, SciNet is using IBM GPFS (BTW, a few years ago when Chris gave
me a tour of the machine room at MP, the cluster he was managing was
using Lustre). Since there is no specific implementation for GPFS,
then ROMIO would default back to ad_ufs, and calls
ADIOI_GEN_ReadContig().

In ADIOI_GEN_ReadContig(), we have code:

ADIO_Offset len;

len = (ADIO_Offset)datatype_size * (ADIO_Offset)count;

And ADIO_Offset is typdef'ed to MPI_Offset, which is 64-bit on 64-bit.
So far so good.

However, the way len is used... hmm, can be an issue:

    ADIOI_Assert(len == (unsigned int) len); /* read takes an unsigned
int parm */

    ...

    err = read(fd->fd_sys, buf, (unsigned int)len);

So wait... read takes an unsigned int?? From the manpage:

       ssize_t read(int fd, void *buf, size_t count);

size_t is not unsigned int... it could be if it is 32-bit, but not
when we are LP64.

Other places in ompi/mca/io/romio/romio/mpi-io/read.c also need to be
updated (those are really easy as they are sanity checks). But at
least someone can try to fix the root cause by changing 2 lines of
code mentioned above, or the ROMIO guys can comment on why an unsigned
int should be passed to read(2)... (Internally, the file offset
(fp_sys_posn) is of type ADIO_Offset, so it should be fine.)

However, I've only spent less than 2 hours on this as I found it
interesting -- 12 years ago I was fixing 32-bit file offset issues in
a supercomputer middleware company, and there are still issues with
32-bit vs 64-bit file pointers today! :-O So I guess 30 years from now
when we run out of space of 64-bit, we will be fixing 32-bit, 64-bit
offset issues for 128-bit applications (that's when we have quantum
computers!)! :-D . Also take the suggestions above at your own risk!
(And I still need to read the "An Abstract-Device Interface for
Implementing Portable Parallel-I/O Interfaces" to understand more
about the internal structures of ROMIO!)

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

On Tue, Aug 7, 2012 at 6:02 PM, Richard Shaw <jrs65_at_[hidden]> wrote:
> On Tuesday, 7 August, 2012 at 12:21 PM, Rob Latham wrote:
>> Hi. Known problem in the ROMIO MPI-IO implementation (which OpenMPI
>> uses). Been on my list of "things to fix" for a while.
>
> Ok, thanks. I'm glad it's not just us.
>
> Is there a timescale for this being fixed? Because if it's a long term thing, I would suggest it might be worth putting a FAQ entry on it or something similar? Especially as it's quite contradictory to most peoples interpretation of the specification. Maybe it's already listed as a known problem somewhere, and I just missed it - it took quite a while before I stopped thinking it was an issue with my code.
>
> Is there a better workaround than just splitting the MPI_File_read up into multiple reads of <2^31 bytes? We're actually trying to read in a distributed array, and the workaround awkwardly requires the creation and reading of multiple darray types, each designed to read in the correct number of blocks less than 2^31 bytes. This seems like it could be a bit fragile.
>
> Thanks again,
> Richard
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

http://blogs.scalablelogic.com/