Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [EXTERNAL] [patch] One-sided communication with derived datatype fails on sparc64
From: Barrett, Brian W (bwbarre_at_[hidden])
Date: 2012-01-12 10:39:51

George -

This looks right to me, but the patches are in the datatype engine, so can
you weigh in?



On 1/11/12 10:04 PM, "Kawashima" <t-kawashima_at_[hidden]> wrote:

>Hi Open MPI developers,
>We, Fujitsu, noticed that one-sided communication with some sort of
>derived datatype fails on sparc64 machines.
>In one-sided communication of Open MPI, the structure of datatype of
>target buffer is:
> (1) encoded in origin process, and
> (2) transfered to target process, and
> (3) decoded in target process.
>This encoding and decoding are processed in ompi_datatype_args.c and
>it has consideration of alignment issue. But it seems insufficient.
>On encoding stage, __ompi_datatype_pack_description function
>has consideration of alignment issue, as described in its comment.
>For derived datatypes of one level, that code is OK.
>But for derived datatypes of multiple level (i.e. derived datatypes
>created from derived datatypes), __ompi_datatype_pack_description
>function is called recursively with unaligned packed_buffer if
>args->ci is odd.
>On the other hand, on decoding stage,
>__ompi_datatype_create_from_packed_description function expects
>a padding for odd args->ci. For derived datatypes, packed_buffer is
>always aligned to 64 bits even if this function is called recursively.
>This incompatibility causes a segmentation fault or something
>in ompi_ddt_create_xxxx function called by __ompi_ddt_create_from_args
>It seems decoding stage and buffer size calculation (ALLOC_ARGS macro)
>have an enough consideration of alignment issue. So I think fixing
>stage is sufficient for this bug.
>I've attached patches for trunk and v1.4 branch respectively.
>A program (needs sparc64) to reproduce this probrem is also attached.
>This bug appears if all following conditions are met.
> - sparc64 or some alignment sensitive architectures
> (configure generates OPAL_ALIGN_WORD_SIZE_INTEGERS == 1)
> - use derived datatype for target buffer of one-sided communication
> - create that derived datatype by multiple level MPI_Type_create_xxxx
> - use one of following function in second level or later
> (args->ci is odd)
> * MPI_Type_create_hvector
> * MPI_Type_create_struct
> * MPI_Type_create_hindexed
> * MPI_Type_create_indexed_block
>Takahiro Kawashima,
>MPI development team,
>devel mailing list

  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories