Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-02-08 15:34:09


Here is sketch of a ROMIO patch for Open MPI. I just wrote it, I
didn't had time to test it. If you can test it please let me know if
this solve the problem.

   Thanks,
     george.

Index: iscontig.c
===================================================================
--- iscontig.c (revision 17399)
+++ iscontig.c (working copy)
@@ -58,6 +58,20 @@
      *flag = MPI_SGI_type_is_contig(datatype) && (displacement == 0);
  }

+#elif defined(OMPI_MPI_H)
+
+#include "ompi/datatype/datatype.h"
+
+void ADIOI_Datatype_iscontig(MPI_Datatype datatype, int *flag)
+{
+ /*
+ * Open MPI contiguous check return true for datatype with
+ * gaps in the beginning and at the end. We have to provide
+ * a count of 2 in order to get these gaps taken into acount.
+ */
+ *flag = ompi_ddt_is_contiguous_memory_layout( datatype, 2);
+}
+
  #else

On Feb 8, 2008, at 12:26 PM, Rainer Keller wrote:

> Hi George,
> Good, if You come to the same conclusion with regard to romio using
> MPI_Type_size internally in RomIO...
>
>
> So taking iscontig.c ,-]
> /* This function needs more work. It should check for contiguity
> in other cases as well.*/
> and mail to the romio list or have a specialized version of
> ADIOI_Datatype_iscontig for ompi ,-]
>
> Either way, the mpi_test_suite in that regard is sane.
>
>
> Thanks,
> Rainer
>
>
> On Friday 08 February 2008 18:22, George Bosilca wrote:
>> MPI_Type_size is supposed to return only the size of useful data,
>> which apparently it does (MPI_SHORT_INT is 6 bytes). What I think it
>> happens is that the MPI_SHORT_INT type is a predefined one, but
>> it's a
>> really strange predefined type. It's one of the few that are not
>> contiguous. The problem seems to come from the fact that the
>> MPI_File_write do a contiguous write for the predefined data types,
>> making the assumption that they are all contiguous.
>>
>> I tracked the problem down in the romio/adio/common/is_contig.c file.
>> For Open MPI the last #else branch is used. The first case in the
>> switch check for the MPI_COMBINER_NAMED (which is what an MPI is
>> supposed to return for predefined data types) and set the flag to 1
>> (which means contiguous). This is obviously wrong for MPI_SHORT_INT.
>> It really look like a ROMIO problem, so I guess this email should be
>> redirected to their mailing list.
>>
>> Thanks,
>> george.
>>
>> On Feb 8, 2008, at 12:50 PM, Christoph Niethammer wrote:
>>> Hello!
>>>
>>> I tested openMPI at HLRS for some time without detecting new
>>> problems in the
>>> implementation but now I recognized some awful ones with MPI_Write
>>> which can
>>> lead to data los:
>>>
>>> When creating a struct for a mixed datatype like
>>>
>>> struct {
>>> short a;
>>> int b;
>>> }
>>>
>>> the C-compiler introduce a gap of 2 bytes in the data representation
>>> for this
>>> type due to the 4byte alignment of the integer on 32bit systems.
>>>
>>> If I now try to use MPI_File_write to write these data to a file and
>>> use
>>> MPI_SHORT_INT as mpi_datatype this leads to a data los.
>>>
>>> I located the problem at the combined use of "write" and
>>> MPI_Type_size in
>>> MPI_File_write.
>>> So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct
>>> uses 8 bytes
>>> in memory as there is a gap of 2 bytes. The write function in
>>> ad_write.c now
>>> leads to the los of the data because the gaps are not within the
>>> calculation
>>> of the complete data size to be written into the file.
>>>
>>> This problem occures also in the other io functions.
>>> As far as I could find out the problem seems not to be present with
>>> derived
>>> data types.
>>>
>>> The question is now how to "fix":
>>> i) Either the MPI_Standard is not clear in this point and the data
>>> types
>>> MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used
>>> with
>>> structs of these types,
>>> ii) Or the implementation of the MPI_Type_size function has to be
>>> modified to
>>> return the value of eg. true_ub which contains the correct value
>>> iii) Or the MPI_File_write function has not to use the write
>>> function in
>>> the "continues" way on the data and should take care of the gaps.
>>>
>>> Regards
>>>
>>> Christoph Niethammer
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> ----------------------------------------------------------------
> Dipl.-Inf. Rainer Keller http://www.hlrs.de/people/keller
> HLRS Tel: ++49 (0)711-685 6 5858
> Nobelstrasse 19 Fax: ++49 (0)711-685 6 5832
> 70550 Stuttgart email: keller_at_[hidden]
> Germany AIM/Skype:rusraink



  • application/pkcs7-signature attachment: smime.p7s