Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!
From: Rainer Keller (keller_at_[hidden])
Date: 2008-02-08 12:26:54


Hi George,
Good, if You come to the same conclusion with regard to romio using
MPI_Type_size internally in RomIO...

So taking iscontig.c ,-]
    /* This function needs more work. It should check for contiguity
       in other cases as well.*/
and mail to the romio list or have a specialized version of
ADIOI_Datatype_iscontig for ompi ,-]

Either way, the mpi_test_suite in that regard is sane.

Thanks,
Rainer

On Friday 08 February 2008 18:22, George Bosilca wrote:
> MPI_Type_size is supposed to return only the size of useful data,
> which apparently it does (MPI_SHORT_INT is 6 bytes). What I think it
> happens is that the MPI_SHORT_INT type is a predefined one, but it's a
> really strange predefined type. It's one of the few that are not
> contiguous. The problem seems to come from the fact that the
> MPI_File_write do a contiguous write for the predefined data types,
> making the assumption that they are all contiguous.
>
> I tracked the problem down in the romio/adio/common/is_contig.c file.
> For Open MPI the last #else branch is used. The first case in the
> switch check for the MPI_COMBINER_NAMED (which is what an MPI is
> supposed to return for predefined data types) and set the flag to 1
> (which means contiguous). This is obviously wrong for MPI_SHORT_INT.
> It really look like a ROMIO problem, so I guess this email should be
> redirected to their mailing list.
>
> Thanks,
> george.
>
> On Feb 8, 2008, at 12:50 PM, Christoph Niethammer wrote:
> > Hello!
> >
> > I tested openMPI at HLRS for some time without detecting new
> > problems in the
> > implementation but now I recognized some awful ones with MPI_Write
> > which can
> > lead to data los:
> >
> > When creating a struct for a mixed datatype like
> >
> > struct {
> > short a;
> > int b;
> > }
> >
> > the C-compiler introduce a gap of 2 bytes in the data representation
> > for this
> > type due to the 4byte alignment of the integer on 32bit systems.
> >
> > If I now try to use MPI_File_write to write these data to a file and
> > use
> > MPI_SHORT_INT as mpi_datatype this leads to a data los.
> >
> > I located the problem at the combined use of "write" and
> > MPI_Type_size in
> > MPI_File_write.
> > So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct
> > uses 8 bytes
> > in memory as there is a gap of 2 bytes. The write function in
> > ad_write.c now
> > leads to the los of the data because the gaps are not within the
> > calculation
> > of the complete data size to be written into the file.
> >
> > This problem occures also in the other io functions.
> > As far as I could find out the problem seems not to be present with
> > derived
> > data types.
> >
> > The question is now how to "fix":
> > i) Either the MPI_Standard is not clear in this point and the data
> > types
> > MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used with
> > structs of these types,
> > ii) Or the implementation of the MPI_Type_size function has to be
> > modified to
> > return the value of eg. true_ub which contains the correct value
> > iii) Or the MPI_File_write function has not to use the write
> > function in
> > the "continues" way on the data and should take care of the gaps.
> >
> > Regards
> >
> > Christoph Niethammer
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
----------------------------------------------------------------
Dipl.-Inf. Rainer Keller   http://www.hlrs.de/people/keller
 HLRS                          Tel: ++49 (0)711-685 6 5858
 Nobelstrasse 19                  Fax: ++49 (0)711-685 6 5832
 70550 Stuttgart                    email: keller_at_[hidden]     
 Germany                             AIM/Skype:rusraink