Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] [bugs] OSC-related datatype bugs
From: Kawashima, Takahiro (t-kawashima_at_[hidden])
Date: 2013-09-04 20:22:09


Hi,

I and my colleague found 3 OSC-related bugs in OMPI datatype code.
One for trunk and v1.6/v1.7 branches, and two for only v1.6 branch.

(1) OMPI_DATATYPE_ALIGN_PTR should be placed after memcpy

  Last year I reported a bug in OMPI datatype code and it was
  fixed in r25721. But the fix was not correct and the problem
  still exists.

  My reported bug and the patch:
    http://www.open-mpi.org/community/lists/devel/2012/01/10207.php
  r25721:
    https://svn.open-mpi.org/trac/ompi/changeset/25721

  OMPI_DATATYPE_ALIGN_PTR should be placed after memcpy
  in __ompi_datatype_pack_description function, like the
  patch attached in my previous mail.

  I didn't confirm r25721 well when it was committed, sorry.

  The attached file datatype-align.patch is the correct patch
  for the latest trunk. This fix should be applied to trunk
  and v1.7/v1.6 branches.

(2) r28790 should be merged into v1.6

  The trunk changeset r28790 had been merged into v1.7 in r28790
  (ticket #3673), but it is not yet merged into v1.6.

  I confirmed the problem reported last month also occurs in v1.6
  and can be fixed by merging r28790 into v1.6.

  The original reported problem:
    http://www.open-mpi.org/community/lists/devel/2013/07/12595.php

(3) OMPI_DATATYPE_MAX_PREDEFINED should be 46 for v1.6

  In v1.6 branch, ompi/datatype/ompi_datatype.h defines
  OMPI_DATATYPE_MAX_PREDEFINED as 45 but the number of
  predefined datatypes is 46 and the last predefined
  datatype ID (OMPI_DATATYPE_MPI_UB) is 45.

  OMPI_DATATYPE_MAX_PREDEFINED is used as the number of
  predefined datatypes or maximum predefined datatype ID + 1,
  not the maximum predefined datatype ID, like below.

    ompi/op/op.c:79:
      // the number of predefined datatypes
      int ompi_op_ddt_map[OMPI_DATATYPE_MAX_PREDEFINED];
    ompi/datatype/ompi_datatype_args.c:573:
      // maximum predefined datatype ID + 1
      assert( data_id < OMPI_DATATYPE_MAX_PREDEFINED );
    ompi/datatype/ompi_datatype_args.c:492:
      // first unused datatype ID
      // (= maximum predefined datatype ID + 1)
      int next_index = OMPI_DATATYPE_MAX_PREDEFINED;

  So its value should be 46 for v1.6.

  Actually, at r28932 in trunk, one datatype (MPI_Count) is
  added but OMPI_DATATYPE_MAX_PREDEFINED is increased
  from 45 to 47. So current trunk is correct.

  This bug causes a random error, like SEGV, "Error recreating
  datatype", or "received packet for Window with unknown type",
  if you use MPI_UB in OSC, like the attached program osc_ub.c.

Regards,
Takahiro Kawashima,
MPI development team,
Fujitsu