Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-02-08 15:37:47


The patch I send few minutes ago will only remove the problem for Open
MPI. However, their generic test for contiguous data types is still
broken. Only checking for COMBINER_NAMED is clearly not enough. A
second test checking that the size and the extent of the data types
are equal will make the check a lot more accurate.

   Thanks,
     george.

On Feb 8, 2008, at 12:26 PM, Rainer Keller wrote:

> Hi George,
> Good, if You come to the same conclusion with regard to romio using
> MPI_Type_size internally in RomIO...
>
>
> So taking iscontig.c ,-]
> /* This function needs more work. It should check for contiguity
> in other cases as well.*/
> and mail to the romio list or have a specialized version of
> ADIOI_Datatype_iscontig for ompi ,-]
>
> Either way, the mpi_test_suite in that regard is sane.
>
>
> Thanks,
> Rainer
>
>
> On Friday 08 February 2008 18:22, George Bosilca wrote:
>> MPI_Type_size is supposed to return only the size of useful data,
>> which apparently it does (MPI_SHORT_INT is 6 bytes). What I think it
>> happens is that the MPI_SHORT_INT type is a predefined one, but
>> it's a
>> really strange predefined type. It's one of the few that are not
>> contiguous. The problem seems to come from the fact that the
>> MPI_File_write do a contiguous write for the predefined data types,
>> making the assumption that they are all contiguous.
>>
>> I tracked the problem down in the romio/adio/common/is_contig.c file.
>> For Open MPI the last #else branch is used. The first case in the
>> switch check for the MPI_COMBINER_NAMED (which is what an MPI is
>> supposed to return for predefined data types) and set the flag to 1
>> (which means contiguous). This is obviously wrong for MPI_SHORT_INT.
>> It really look like a ROMIO problem, so I guess this email should be
>> redirected to their mailing list.
>>
>> Thanks,
>> george.
>>
>> On Feb 8, 2008, at 12:50 PM, Christoph Niethammer wrote:
>>> Hello!
>>>
>>> I tested openMPI at HLRS for some time without detecting new
>>> problems in the
>>> implementation but now I recognized some awful ones with MPI_Write
>>> which can
>>> lead to data los:
>>>
>>> When creating a struct for a mixed datatype like
>>>
>>> struct {
>>> short a;
>>> int b;
>>> }
>>>
>>> the C-compiler introduce a gap of 2 bytes in the data representation
>>> for this
>>> type due to the 4byte alignment of the integer on 32bit systems.
>>>
>>> If I now try to use MPI_File_write to write these data to a file and
>>> use
>>> MPI_SHORT_INT as mpi_datatype this leads to a data los.
>>>
>>> I located the problem at the combined use of "write" and
>>> MPI_Type_size in
>>> MPI_File_write.
>>> So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct
>>> uses 8 bytes
>>> in memory as there is a gap of 2 bytes. The write function in
>>> ad_write.c now
>>> leads to the los of the data because the gaps are not within the
>>> calculation
>>> of the complete data size to be written into the file.
>>>
>>> This problem occures also in the other io functions.
>>> As far as I could find out the problem seems not to be present with
>>> derived
>>> data types.
>>>
>>> The question is now how to "fix":
>>> i) Either the MPI_Standard is not clear in this point and the data
>>> types
>>> MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used
>>> with
>>> structs of these types,
>>> ii) Or the implementation of the MPI_Type_size function has to be
>>> modified to
>>> return the value of eg. true_ub which contains the correct value
>>> iii) Or the MPI_File_write function has not to use the write
>>> function in
>>> the "continues" way on the data and should take care of the gaps.
>>>
>>> Regards
>>>
>>> Christoph Niethammer
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> ----------------------------------------------------------------
> Dipl.-Inf. Rainer Keller http://www.hlrs.de/people/keller
> HLRS Tel: ++49 (0)711-685 6 5858
> Nobelstrasse 19 Fax: ++49 (0)711-685 6 5832
> 70550 Stuttgart email: keller_at_[hidden]
> Germany AIM/Skype:rusraink



  • application/pkcs7-signature attachment: smime.p7s