Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!
From: Christoph Niethammer (christoph.niethammer_at_[hidden])
Date: 2008-02-08 12:50:37


Hello!

I tested openMPI at HLRS for some time without detecting new problems in the
implementation but now I recognized some awful ones with MPI_Write which can
lead to data los:

When creating a struct for a mixed datatype like

struct {
  short a;
  int b;
}

the C-compiler introduce a gap of 2 bytes in the data representation for this
type due to the 4byte alignment of the integer on 32bit systems.

If I now try to use MPI_File_write to write these data to a file and use
MPI_SHORT_INT as mpi_datatype this leads to a data los.

I located the problem at the combined use of "write" and MPI_Type_size in
MPI_File_write.
So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct uses 8 bytes
in memory as there is a gap of 2 bytes. The write function in ad_write.c now
leads to the los of the data because the gaps are not within the calculation
of the complete data size to be written into the file.

This problem occures also in the other io functions.
As far as I could find out the problem seems not to be present with derived
data types.

The question is now how to "fix":
i) Either the MPI_Standard is not clear in this point and the data types
MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used with
structs of these types,
ii) Or the implementation of the MPI_Type_size function has to be modified to
return the value of eg. true_ub which contains the correct value
iii) Or the MPI_File_write function has not to use the write function in
the "continues" way on the data and should take care of the gaps.

Regards

Christoph Niethammer