Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Can't read more than 2^31 bytes with MPI_File_read, regardless of type?
From: Rayson Ho (raysonlogin_at_[hidden])
Date: 2012-08-07 22:11:14


I originally thought that it was an issue related to 32-bit
executables, but it seems to affect 64-bit as well...

I found references to this problem -- it was reported back in 2007:

http://lists.mcs.anl.gov/pipermail/mpich-discuss/2007-July/002600.html

If you look at the code, you will find that MPI_File_read() calls the
special I/O driver implementation if one's available, but if not then
there's also the generic ad_ufs device (POSIX) implementation.

IIRC, SciNet is using IBM GPFS (BTW, a few years ago when Chris gave
me a tour of the machine room at MP, the cluster he was managing was
using Lustre). Since there is no specific implementation for GPFS,
then ROMIO would default back to ad_ufs, and calls
ADIOI_GEN_ReadContig().

In ADIOI_GEN_ReadContig(), we have code:

ADIO_Offset len;

len = (ADIO_Offset)datatype_size * (ADIO_Offset)count;

And ADIO_Offset is typdef'ed to MPI_Offset, which is 64-bit on 64-bit.
So far so good.

However, the way len is used... hmm, can be an issue:

    ADIOI_Assert(len == (unsigned int) len); /* read takes an unsigned
int parm */

    ...

    err = read(fd->fd_sys, buf, (unsigned int)len);

So wait... read takes an unsigned int?? From the manpage:

       ssize_t read(int fd, void *buf, size_t count);

size_t is not unsigned int... it could be if it is 32-bit, but not
when we are LP64.

Other places in ompi/mca/io/romio/romio/mpi-io/read.c also need to be
updated (those are really easy as they are sanity checks). But at
least someone can try to fix the root cause by changing 2 lines of
code mentioned above, or the ROMIO guys can comment on why an unsigned
int should be passed to read(2)... (Internally, the file offset
(fp_sys_posn) is of type ADIO_Offset, so it should be fine.)

However, I've only spent less than 2 hours on this as I found it
interesting -- 12 years ago I was fixing 32-bit file offset issues in
a supercomputer middleware company, and there are still issues with
32-bit vs 64-bit file pointers today! :-O So I guess 30 years from now
when we run out of space of 64-bit, we will be fixing 32-bit, 64-bit
offset issues for 128-bit applications (that's when we have quantum
computers!)! :-D . Also take the suggestions above at your own risk!
(And I still need to read the "An Abstract-Device Interface for
Implementing Portable Parallel-I/O Interfaces" to understand more
about the internal structures of ROMIO!)

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

On Tue, Aug 7, 2012 at 6:02 PM, Richard Shaw <jrs65_at_[hidden]> wrote:
> On Tuesday, 7 August, 2012 at 12:21 PM, Rob Latham wrote:
>> Hi. Known problem in the ROMIO MPI-IO implementation (which OpenMPI
>> uses). Been on my list of "things to fix" for a while.
>
> Ok, thanks. I'm glad it's not just us.
>
> Is there a timescale for this being fixed? Because if it's a long term thing, I would suggest it might be worth putting a FAQ entry on it or something similar? Especially as it's quite contradictory to most peoples interpretation of the specification. Maybe it's already listed as a known problem somewhere, and I just missed it - it took quite a while before I stopped thinking it was an issue with my code.
>
> Is there a better workaround than just splitting the MPI_File_read up into multiple reads of <2^31 bytes? We're actually trying to read in a distributed array, and the workaround awkwardly requires the creation and reading of multiple darray types, each designed to read in the correct number of blocks less than 2^31 bytes. This seems like it could be a bit fragile.
>
> Thanks again,
> Richard
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

http://blogs.scalablelogic.com/