Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Datatype initialization bug?
From: Kawashima, Takahiro (t-kawashima_at_[hidden])
Date: 2013-05-23 03:33:54


George,

Thanks. My colleague has verified your commit.
This commit will make datatype code a bit simpler...

Regards,
Takahiro Kawashima,
MPI development team,
Fujitsu

> Takahiro,
>
> I used your second patch the one that remove the copy of the description in the OMPI level (r28553). Thanks for your help and your patience in investigating this issues.
>
> George.
>
>
> On May 22, 2013, at 02:05 , "Kawashima, Takahiro" <t-kawashima_at_[hidden]> wrote:
>
> > George,
> >
> > Thanks for your quick response.
> > Your fix seemed good to me last week, but this week my colleague
> > found it's not sufficient. There are two issues.
> >
> > (A) We should update opt_desc too.
> >
> > In current ompi_datatype_init, we copy OPAL desc to OMPI desc.
> > But opt_desc still points to OPAL desc. We should update
> > opt_desc to point copied OMPI desc.
> >
> > (B) Fortran desc is not properly set.
> >
> > See the attached result-before.txt. It is the output of the
> > attached show_ompi_datatype.c. Fortran basic datatypes,
> > like MPI_CHARACTER, MPI_REAL, MPI_DOUBLE_PRECISION, have
> > wrong desc_t.
> >
> > It is because these datatypes are statically initialized with
> > OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE_FORTRAN macro and
> > desc and opt_desc point to one element of
> > ompi_datatype_predefined_elem_desc array with an OPAL index.
> > For example, desc of ompi_mpi_character points to
> > ompi_datatype_predefined_elem_desc[OPAL_DATATYPE_INT1].
> > If we use ompi_datatype_predefined_elem_desc, we should use
> > an OMPI datatype index (OMPI_DATATYPE_MPI_INT8_T etc.) and not
> > an OPAL datatype index (OPAL_DATATYPE_INT1 etc.).
> >
> > Therefore the condition (pDesc != datatype->super.desc.desc)
> > in ompi_datatype_init becomes true and we copy desc from the
> > wrong part currently.
> > i.e. copy from ompi_datatype_predefined_elem_desc[OPAL_DATATYPE_INT1]
> > to ompi_datatype_predefined_elem_desc[OMPI_DATATYPE_MPI_CHARACTER].
> >
> > The initialization part of ompi_mpi_character in
> > ompi_datatype_internal.h and ompi_datatype_module.c:
> > ----------------------------------------------------------------
> > ompi_predefined_datatype_t ompi_mpi_character = OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE_FORTRAN (INT, CHARACTER, 1, OPAL_ALIGNMENT_CHAR, 0 );
> >
> > #define OMPI_DATATYPE_INITIALIZER_FORTRAN( TYPE, NAME, SIZE, ALIGN, FLAGS ) \
> > { \
> > OPAL_OBJ_STATIC_INIT(opal_datatype_t), \
> > OPAL_DATATYPE_FLAG_BASIC | \
> > OMPI_DATATYPE_FLAG_PREDEFINED | \
> > OMPI_DATATYPE_FLAG_DATA_FORTRAN | (FLAGS) /*flag*/, \
> > OPAL_DATATYPE_ ## TYPE ## SIZE /*id*/, \
> > (((uint32_t)1)<<(OPAL_DATATYPE_ ## TYPE ## SIZE)) /*bdt_used*/, \
> > SIZE /*size*/, \
> > 0 /*true_lb*/, SIZE /*true_ub*/, 0 /*lb*/, SIZE /*ub*/, \
> > (ALIGN) /*align*/, \
> > 1 /*nbElems*/, \
> > OPAL_DATATYPE_INIT_NAME(TYPE ## SIZE) /*name*/, \
> > OMPI_DATATYPE_INIT_DESC_PREDEFINED(TYPE, SIZE) /*desc*/, \
> > OMPI_DATATYPE_INIT_DESC_PREDEFINED(TYPE, SIZE) /*opt_desc*/, \
> > OPAL_DATATYPE_INIT_BTYPES_ARRAY_ ## TYPE ## SIZE /*btypes*/ \
> >
> > #define OMPI_DATATYPE_INIT_DESC_PREDEFINED(TYPE, SIZE) \
> > { \
> > 1 /*length*/, 1 /*used*/, \
> > &(ompi_datatype_predefined_elem_desc[2 * OPAL_DATATYPE_ ## TYPE ## SIZE]) /*desc*/ \
> > }
> >
> > int32_t ompi_datatype_init( void )
> > {
> > int32_t i;
> >
> > for( i = 0; i < OMPI_DATATYPE_MPI_MAX_PREDEFINED; i++ ) {
> > ompi_datatype_t* datatype = (ompi_datatype_t*)ompi_datatype_basicDatatypes[i];
> > dt_elem_desc_t* pDesc;
> >
> > if( 0 == datatype->super.size ) continue;
> >
> > /**
> > * Most of the OMPI datatypes have been initialized with the basic desc of the
> > * OPAL datatypes. Thus don't modify the desc, instead rebase the desc back into
> > * the OMPI predefined_elem_desc and update the fields there.
> > */
> > pDesc = &ompi_datatype_predefined_elem_desc[2 * i];
> > if( pDesc != datatype->super.desc.desc ) {
> > memcpy(pDesc, datatype->super.desc.desc, 2 * sizeof(dt_elem_desc_t));
> > datatype->super.desc.desc = pDesc;
> > } else {
> > datatype->super.desc.desc[0].elem.common.flags = OPAL_DATATYPE_FLAG_PREDEFINED |
> > OPAL_DATATYPE_FLAG_DATA |
> > OPAL_DATATYPE_FLAG_CONTIGUOUS |
> > OPAL_DATATYPE_FLAG_NO_GAPS;
> > datatype->super.desc.desc[0].elem.common.type = i;
> > ....
> > ----------------------------------------------------------------
> >
> > Do you intend to it goes to the else-block in ompi_datatype_init
> > for Fortran datatypes? If so, we should use an OMPI index in
> > OMPI_DATATYPE_INIT_DESC_PREDEFINED.
> >
> > But in the else-block, desc[0].elem.common.type is set to an OMPI
> > datatype index. And it seems that this 'type' is treated as an
> > OPAL datatype index in other parts.
> > # OMPI_DATATYPE_MPI_CHARACTER and OPAL_DATATYPE_COMPLEX8 has
> > # same value (0x13) but how to distinguish them?
> >
> > I wonder whether Fortran datatypes really need separate desc.
> > Though OPAL does not have *identical* datatypes, it always has
> > *corresponding* datatypes. It is obvious because we currently
> > translate an OMPI Fortran datatype to a corresponding OPAL
> > datatype index in OMPI_DATATYPE_INIT_DESC_PREDEFINED as
> > "OPAL_DATATYPE_ ## TYPE ## SIZE".
> >
> > If Fortran datatypes don't need separate desc, this issue
> > may be fixed by my attached datatype-init-1.patch.
> > It also fixes the opt_desc issue described first.
> >
> > Furthermore, do we need to copy desc for OMPI datatypes?
> > If not, use my attached datatype-init-2.patch instead.
> > It don't copy desc and OMPI desc points OPAL desc.
> > I'm not sure this is a correct solution.
> >
> > The attached result-after.txt is the output of the attached
> > show_ompi_datatype.c with my patch. I think this output is
> > correct.
> >
> > Regards,
> > Takahiro Kawashima,
> > MPI development team,
> > Fujitsu
> >
> >> Takahiro,
> >>
> >> Nice catch, I really wonder how this one survived for soo long. I pushed a patch in r28535 addressing this issue. It is not the best solution, but it provide an easy way to address the issue.
> >>
> >> A little bit of history. A datatype is composed by (let's keep it short) 2 component, a high-level description containing among others the size and the name of the datatype and a low level description (the desc_t part) containing the basic predefined elements in the datatype. As most of the predefined datatypes defined in the MPI layer are synonyms to some basic predefined datatypes (such as the equivalent POSIX types MPI_INT32_T), the design of the datatype allowed for the sharing of the desc_t part between datatypes. This approach allows us to have similar datatypes (MPI_INT and MPI_INT32_T) with different names but with the same backend internal description. However, when we split the datatype engine in two, we duplicate this common description (in OPAL and OMPI). The OMPI desc_t was pointing to OPAL desc_t for almost everything … except the datatypes that were not defined by OPAL such as the Fortran one. This turned the management of the common desc_t into a ni!
> ghtmare … with the effect you noticed few days ago. Too bad for the optimization part. I now duplicate the desc_t between the two layers, and all OMPI datatypes have now their own desc_t.
> >>
> >> Thanks for finding and analyzing so deeply this issue.
> >> George.
> >>
> >>
> >>
> >>
> >> On May 16, 2013, at 12:04 , KAWASHIMA Takahiro <rivis.kawashima_at_[hidden]> wrote:
> >>
> >>> Hi,
> >>>
> >>> I'm reading the datatype code in Open MPI trunk and have a question.
> >>> A bit long.
> >>>
> >>> See the following program.
> >>>
> >>> ----------------------------------------------------------------
> >>> #include <stdio.h>
> >>> #include <mpi.h>
> >>>
> >>> struct opal_datatype_t;
> >>> extern int opal_init(int *pargc, char ***pargv);
> >>> extern int opal_finalize(void);
> >>> extern void opal_datatype_dump(struct opal_datatype_t *type);
> >>> extern struct opal_datatype_t opal_datatype_int8;
> >>>
> >>> int main(int argc, char **argv)
> >>> {
> >>> opal_init(NULL, NULL);
> >>> opal_datatype_dump(&opal_datatype_int8);
> >>> MPI_Init(NULL, NULL);
> >>> opal_datatype_dump(&opal_datatype_int8);
> >>> MPI_Finalize();
> >>> opal_finalize();
> >>> return 0;
> >>> }
> >>> ----------------------------------------------------------------
> >>>
> >>> All variables/functions declared as 'extern' are defined in OPAL.
> >>> opal_datatype_dump() function outputs internal data of a datatype.
> >>> I expect the same output on two opal_datatype_dump() calls.
> >>> But when I run it on an x86_64 machine, I get the following output.
> >>>
> >>> ----------------------------------------------------------------
> >>> ompi-trunk/opal-datatype-dump && ompiexec -n 1 ompi-trunk/opal-datatype-dump
> >>> [ppc.rivis.jp:27886] Datatype 0x600c60[OPAL_INT8] size 8 align 8 id 7 length 1 used 1
> >>> true_lb 0 true_ub 8 (true_extent 8) lb 0 ub 8 (extent 8)
> >>> nbElems 1 loops 0 flags 136 (commited contiguous )-cC---P-DB-[---][---]
> >>> contain OPAL_INT8
> >>> --C---P-D--[---][---] OPAL_INT8 count 1 disp 0x0 (0) extent 8 (size 8)
> >>> No optimized description
> >>>
> >>> [ppc.rivis.jp:27886] Datatype 0x600c60[OPAL_INT8] size 8 align 8 id 7 length 1 used 1
> >>> true_lb 0 true_ub 8 (true_extent 8) lb 0 ub 8 (extent 8)
> >>> nbElems 1 loops 0 flags 136 (commited contiguous )-cC---P-DB-[---][---]
> >>> contain OPAL_INT8
> >>> --C---P-D--[---][---] count 1 disp 0x0 (0) extent 8 (size 8971008)
> >>> No optimized description
> >>> ----------------------------------------------------------------
> >>>
> >>> The former output is what I expected. But the latter one is not
> >>> identical to the former one and its content datatype has no name
> >>> and a very large size.
> >>>
> >>> This line is output in opal_datatype_dump_data_desc() function in
> >>> opal/datatype/opal_datatype_dump.c file. It refers
> >>> opal_datatype_basicDatatypes[pDesc->elem.common.type]->name and
> >>> opal_datatype_basicDatatypes[pDesc->elem.common.type]->size for
> >>> the content datatype.
> >>>
> >>> In this case, pDesc->elem.common.type is
> >>> opal_datatype_int8.desc.desc[0].elem.common.type and is initialized to 7
> >>> in opal_datatype_init() function in opal/datatype/opal_datatype_module.c
> >>> file, which is called during opal_init() function.
> >>> opal_datatype_int8.desc.desc points &opal_datatype_predefined_elem_desc[7*2].
> >>>
> >>> But if we call MPI_Init() function, the value is overwritten.
> >>> ompi_datatype_init() function in ompi/datatype/ompi_datatype_module.c
> >>> file, which is called during MPI_Init() function, has similar
> >>> procedure to initialize OMPI datatypes.
> >>>
> >>> On initializing ompi_mpi_aint in it, ompi_mpi_aint.dt.super.desc.desc
> >>> points &opal_datatype_predefined_elem_desc[7*2], which is also pointed
> >>> by opal_datatype_int8, because ompi_mpi_aint is defined by
> >>> OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE macro and it uses
> >>> OPAL_DATATYPE_INITIALIZER_INT8 macro. So
> >>> opal_datatype_int8.desc.desc[0].elem.common.type is overwritten
> >>> to 37.
> >>>
> >>> Therefore in the second opal_datatype_dump() function call in my
> >>> program, opal_datatype_basicDatatypes[37] is accessed.
> >>> But the array length of opal_datatype_basicDatatypes is 25.
> >>>
> >>> Summarize:
> >>>
> >>> static initializer:
> >>> opal_datatype_predefined_elem_desc[25] = {{0, ...}, ...};
> >>> opal_datatype_int8.desc.desc = &opal_datatype_predefined_elem_desc[7*2];
> >>> ompi_mpi_aint.dt.super.desc.desc = &opal_datatype_predefined_elem_desc[7*2];
> >>>
> >>> opal_init:
> >>> opal_datatype_int8.desc.desc.elem.common.type = 7;
> >>>
> >>> MPI_Init:
> >>> ompi_mpi_aint.dt.super.desc.desc.elem.common.type = 37;
> >>>
> >>> opal_datatype_dump:
> >>> access to opal_datatype_predefined_elem_desc[37]
> >>>
> >>> While opal_datatype_dump() function might not be called from
> >>> user's programs, breaking opal_datatype_predefined_elem_desc
> >>> array in ompi_datatype_init() function is not good.
> >>>
> >>> Though the above is described for opal_datatype_int8 and ompi_mpi_aint,
> >>> the same thing happens to other datatypes.
> >>>
> >>> Though I tried to fix this problem, I could not figure out the
> >>> correct solution.
> >>>
> >>> - The first loop in ompi_datatype_init() function should be removed?
> >>> But OMPI Fortran datatypes should be initialized in it?
> >>>
> >>> - All OMPI datatypes should point ompi_datatype_predefined_elem_desc
> >>> array? But having same 'type' value in OPAL datatypes and OMPI
> >>> datatypes is allowed?