Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [EXTERNAL] [patch] One-sided communication with derived datatype fails on sparc64
From: George Bosilca (bosilca_at_[hidden])
Date: 2012-01-12 14:17:19


The problem is correctly identified and solved. I already pushed the patch in the trunk. I will create the CMR for both 1.5 and 1.4.

Kudos to the Fujitsu team, that was a tricky one to find. Thanks for you contributions!

  george.

On Jan 12, 2012, at 10:39 , Barrett, Brian W wrote:

> George -
>
> This looks right to me, but the patches are in the datatype engine, so can
> you weigh in?
>
> Thanks,
>
> Brian
>
> On 1/11/12 10:04 PM, "Kawashima" <t-kawashima_at_[hidden]> wrote:
>
>> Hi Open MPI developers,
>>
>> We, Fujitsu, noticed that one-sided communication with some sort of
>> derived datatype fails on sparc64 machines.
>>
>> In one-sided communication of Open MPI, the structure of datatype of
>> target buffer is:
>> (1) encoded in origin process, and
>> (2) transfered to target process, and
>> (3) decoded in target process.
>>
>> This encoding and decoding are processed in ompi_datatype_args.c and
>> it has consideration of alignment issue. But it seems insufficient.
>>
>> On encoding stage, __ompi_datatype_pack_description function
>> has consideration of alignment issue, as described in its comment.
>> For derived datatypes of one level, that code is OK.
>> But for derived datatypes of multiple level (i.e. derived datatypes
>> created from derived datatypes), __ompi_datatype_pack_description
>> function is called recursively with unaligned packed_buffer if
>> args->ci is odd.
>>
>> On the other hand, on decoding stage,
>> __ompi_datatype_create_from_packed_description function expects
>> a padding for odd args->ci. For derived datatypes, packed_buffer is
>> always aligned to 64 bits even if this function is called recursively.
>>
>> This incompatibility causes a segmentation fault or something
>> in ompi_ddt_create_xxxx function called by __ompi_ddt_create_from_args
>> function.
>>
>> It seems decoding stage and buffer size calculation (ALLOC_ARGS macro)
>> have an enough consideration of alignment issue. So I think fixing
>> encoding
>> stage is sufficient for this bug.
>>
>> I've attached patches for trunk and v1.4 branch respectively.
>> A program (needs sparc64) to reproduce this probrem is also attached.
>>
>> This bug appears if all following conditions are met.
>>
>> - sparc64 or some alignment sensitive architectures
>> (configure generates OPAL_ALIGN_WORD_SIZE_INTEGERS == 1)
>> - use derived datatype for target buffer of one-sided communication
>> - create that derived datatype by multiple level MPI_Type_create_xxxx
>> - use one of following function in second level or later
>> (args->ci is odd)
>> * MPI_Type_create_hvector
>> * MPI_Type_create_struct
>> * MPI_Type_create_hindexed
>> * MPI_Type_create_indexed_block
>>
>>
>> Regards,
>>
>> Takahiro Kawashima,
>> MPI development team,
>> Fujitsu
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Brian W. Barrett
> Dept. 1423: Scalable System Software
> Sandia National Laboratories
>
>
>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel