Subject: Re: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-02-08 12:22:19

MPI_Type_size is supposed to return only the size of useful data,
which apparently it does (MPI_SHORT_INT is 6 bytes). What I think it
happens is that the MPI_SHORT_INT type is a predefined one, but it's a
really strange predefined type. It's one of the few that are not
contiguous. The problem seems to come from the fact that the
MPI_File_write do a contiguous write for the predefined data types,
making the assumption that they are all contiguous.

I tracked the problem down in the romio/adio/common/is_contig.c file.
For Open MPI the last #else branch is used. The first case in the
switch check for the MPI_COMBINER_NAMED (which is what an MPI is
supposed to return for predefined data types) and set the flag to 1
(which means contiguous). This is obviously wrong for MPI_SHORT_INT.
It really look like a ROMIO problem, so I guess this email should be
redirected to their mailing list.


On Feb 8, 2008, at 12:50 PM, Christoph Niethammer wrote:

> Hello!
> I tested openMPI at HLRS for some time without detecting new
> problems in the
> implementation but now I recognized some awful ones with MPI_Write
> which can
> lead to data los:
> When creating a struct for a mixed datatype like
> struct {
> short a;
> int b;
> }
> the C-compiler introduce a gap of 2 bytes in the data representation
> for this
> type due to the 4byte alignment of the integer on 32bit systems.
> If I now try to use MPI_File_write to write these data to a file and
> use
> MPI_SHORT_INT as mpi_datatype this leads to a data los.
> I located the problem at the combined use of "write" and
> MPI_Type_size in
> MPI_File_write.
> So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct
> uses 8 bytes
> in memory as there is a gap of 2 bytes. The write function in
> ad_write.c now
> leads to the los of the data because the gaps are not within the
> calculation
> of the complete data size to be written into the file.
> This problem occures also in the other io functions.
> As far as I could find out the problem seems not to be present with
> derived
> data types.
> The question is now how to "fix":
> i) Either the MPI_Standard is not clear in this point and the data
> types
> MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used with
> structs of these types,
> ii) Or the implementation of the MPI_Type_size function has to be
> modified to
> return the value of eg. true_ub which contains the correct value
> iii) Or the MPI_File_write function has not to use the write
> function in
> the "continues" way on the data and should take care of the gaps.
> Regards
> Christoph Niethammer
> _______________________________________________
> devel mailing list
> devel_at_[hidden]

  • application/pkcs7-signature attachment: smime.p7s