Subject: [OMPI devel] [bug] One-sided communication with a duplicated datatype
From: KAWASHIMA Takahiro (rivis.kawashima_at_[hidden])
Date: 2013-07-14 08:30:13


I encountered an assertion failure in Open MPI trunk and found a bug.

See the attached program. This program can be run with mpiexec -n 1.
This program calls MPI_Put and writes one int value to the target side.
The target side datatype is equivalent to MPI_INT, but is a derived
datatype created by MPI_Type_contiguous and MPI_Type_Dup.

This program aborts with the following output.

#### dt1 (0x2626160) ####
type 2 count ints 1 count disp 0 count datatype 1
ints: 1
types: MPI_INT
#### dt2 (0x2626340) ####
type 1 count ints 0 count disp 0 count datatype 1
types: 0x2626160
put_dup_type: ../../../ompi/datatype/ompi_datatype_args.c:565: __ompi_datatype_create_from_packed_description: Assertion `data_id < 45' failed.
[ppc:05244] *** Process received signal ***
[ppc:05244] Signal: Aborted (6)
[ppc:05244] Signal code: (-6)
[ppc:05244] [ 0] /lib/ [0x7fe58a275ff0]
[ppc:05244] [ 1] /lib/ [0x7fe589f371b5]
[ppc:05244] [ 2] /lib/ [0x7fe589f39fc0]
[ppc:05244] [ 3] /lib/ [0x7fe589f30301]
[ppc:05244] [ 4] /ompi/lib/ [0x7fe58a4e804e]
[ppc:05244] [ 5] /ompi/lib/ [0x7fe58a4e8cf6]
[ppc:05244] [ 6] /ompi/lib/openmpi/ [0x7fe5839a104b]
[ppc:05244] [ 7] /ompi/lib/openmpi/ [0x7fe5839a3ae5]
[ppc:05244] [ 8] /ompi/lib/openmpi/ [0x7fe58399c6cc]
[ppc:05244] [ 9] /ompi/lib/openmpi/ [0x7fe58510bb04]
[ppc:05244] [10] /ompi/lib/openmpi/ [0x7fe5839a044b]
[ppc:05244] [11] /ompi/lib/openmpi/ [0x7fe5839a169d]
[ppc:05244] [12] /ompi/lib/openmpi/ [0x7fe5839a1776]
[ppc:05244] [13] /ompi/lib/openmpi/ [0x7fe5839a84ab]
[ppc:05244] [14] /ompi/lib/ [0x7fe58a54127d]
[ppc:05244] [15] ompi-trunk/put_dup_type() [0x400d10]
[ppc:05244] [16] /lib/ [0x7fe589f23c8d]
[ppc:05244] [17] put_dup_type() [0x400b09]
[ppc:05244] *** End of error message ***
mpiexec noticed that process rank 0 with PID 5244 on node ppc exited on signal 6 (Aborted).

__ompi_datatype_create_from_packed_description function, in which the
assertion failure occurred, seems to expect the value of data_id is an
ID of a predefined datatype. In my environment, the value of data_id
is 68, that is an ID of the datatype created by MPI_Type_contiguous.

On one-sided communication, the target side datatype is encoded as
'description' at the origin side and then it is decoded at the target
side. I think there are problems in both encoding stage and decoding

__ompi_datatype_pack_description function in
ompi/datatype/ompi_datatype_args.c file encodes the datatype.
For MPI_COMBINER_DUP on line 451, it encodes only create_type and id
and returns immediately. It doesn't encode the information of the base
dataype (in my case, the datatype created by MPI_Type_contiguous).

__ompi_datatype_create_from_packed_description function in
ompi/datatype/ompi_datatype_args.c file decodes the description.
For MPI_COMBINER_DUP in line 557, it expects the value of data_id is
an ID of a predefined datatype. It is not always true.

I cannot fix this problem yet because I'm not familiar with the datatype
code in Open MPI. MPI_COMBINER_DUP is also used for predefined datatypes
and the calculation of total_pack_size is also involved. It seems not
so simple.