Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Possible bug with derived datatypes and openib BTL in trunk
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2014-04-16 14:34:10


I have seen errors when running the intel test suite using the openib BTL when transferring derived datatypes. I do not see the error with sm or tcp BTLs. The errors begin after this checkin.

https://svn.open-mpi.org/trac/ompi/changeset/31370
Timestamp: 04/11/14 16:06:56 (5 days ago)
Author: bosilca
Message: Reshape all the packing/unpacking functions to use the same skeleton. Rewrite the
generic_unpacking to take advantage of the same capabilitites.

Does anyone else see errors? Here is an example running with r31370:

[rvandevaart_at_drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2 -host drossetti-ivy0,drossetti-ivy1 --mca btl_openib_warn_default_gid_prefix 0 MPI_Isend_ator_c
MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
MPITEST error (1): 2 errors in buffer (17,0,12) len 273 commsize 2 commtype -10 data_type 13 root 1
MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
MPITEST error (1): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype -16 data_type 13 root 1
MPITEST info (0): Starting MPI_Isend_ator: All Isend TO Root test
MPITEST info (0): Node spec MPITEST_comm_sizes[6]=2 too large, using 1
MPITEST info (0): Node spec MPITEST_comm_sizes[22]=2 too large, using 1
MPITEST info (0): Node spec MPITEST_comm_sizes[32]=2 too large, using 1
MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118
MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, expected -60
MPITEST error (0): 2 errors in buffer (17,0,12) len 273 commsize 2 commtype -10 data_type 13 root 0
MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118
MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, expected -60
MPITEST error (0): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype -16 data_type 13 root 0
MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
MPITEST error (1): 2 errors in buffer (17,4,12) len 273 commsize 2 commtype -13 data_type 13 root 1
MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118
MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, expected -60
MPITEST error (0): 2 errors in buffer (17,4,12) len 273 commsize 2 commtype -13 data_type 13 root 0
MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
MPITEST error (1): 2 errors in buffer (17,6,12) len 273 commsize 2 commtype -15 data_type 13 root 0
MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, expected -61
MPITEST error (0): 2 errors in buffer (17,6,12) len 273 commsize 2 commtype -15 data_type 13 root 0
MPITEST_results: MPI_Isend_ator: All Isend TO Root 8 tests FAILED (of 3744)
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[12363,1],0]
  Exit code: 4
--------------------------------------------------------------------------
[rvandevaart_at_drossetti-ivy1 src]$

Here is an error with the trunk which is slightly different.
[rvandevaart_at_drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2 -host drossetti-ivy0,drossetti-ivy1 --mca btl_openib_warn_default_gid_prefix 0 MPI_Isend_ator_c
[drossetti-ivy1.nvidia.com:22875] ../../../opal/datatype/opal_datatype_position.c:72
        Pointer 0x1ad414c size 4 is outside [0x1ac1d20,0x1ad1d08] for
        base ptr 0x1ac1d20 count 273 and data
[drossetti-ivy1.nvidia.com:22875] Datatype 0x1ac0220[] size 104 align 16 id 0 length 22 used 21
true_lb 0 true_ub 232 (true_extent 232) lb 0 ub 240 (extent 240)
nbElems 21 loops 0 flags 1C4 (commited )-c--lu-GD--[---][---]
   contain lb ub OPAL_LB OPAL_UB OPAL_INT1 OPAL_INT2 OPAL_INT4 OPAL_INT8 OPAL_UINT1 OPAL_UINT2 OPAL_UINT4 OPAL_UINT8 OPAL_FLOAT4 OPAL_FLOAT8 OPAL_FLOAT16
--C---P-D--[---][---] OPAL_INT4 count 1 disp 0x0 (0) extent 4 (size 4)
--C---P-D--[---][---] OPAL_INT2 count 1 disp 0x8 (8) extent 2 (size 2)
--C---P-D--[---][---] OPAL_INT8 count 1 disp 0x10 (16) extent 8 (size 8)
--C---P-D--[---][---] OPAL_UINT2 count 1 disp 0x20 (32) extent 2 (size 2)
--C---P-D--[---][---] OPAL_UINT4 count 1 disp 0x24 (36) extent 4 (size 4)
--C---P-D--[---][---] OPAL_UINT8 count 1 disp 0x30 (48) extent 8 (size 8)
--C---P-D--[---][---] OPAL_FLOAT4 count 1 disp 0x40 (64) extent 4 (size 4)
--C---P-D--[---][---] OPAL_INT1 count 1 disp 0x48 (72) extent 1 (size 1)
--C---P-D--[---][---] OPAL_FLOAT8 count 1 disp 0x50 (80) extent 8 (size 8)
--C---P-D--[---][---] OPAL_UINT1 count 1 disp 0x60 (96) extent 1 (size 1)
--C---P-D--[---][---] OPAL_FLOAT16 count 1 disp 0x70 (112) extent 16 (size 16)
--C---P-D--[---][---] OPAL_INT1 count 1 disp 0x90 (144) extent 1 (size 1)
--C---P-D--[---][---] OPAL_UINT1 count 1 disp 0x92 (146) extent 1 (size 1)
--C---P-D--[---][---] OPAL_INT2 count 1 disp 0x94 (148) extent 2 (size 2)
--C---P-D--[---][---] OPAL_UINT2 count 1 disp 0x98 (152) extent 2 (size 2)
--C---P-D--[---][---] OPAL_INT4 count 1 disp 0x9c (156) extent 4 (size 4)
--C---P-D--[---][---] OPAL_UINT4 count 1 disp 0xa4 (164) extent 4 (size 4)
--C---P-D--[---][---] OPAL_INT8 count 1 disp 0xb0 (176) extent 8 (size 8)
--C---P-D--[---][---] OPAL_UINT8 count 1 disp 0xc0 (192) extent 8 (size 8)
--C---P-D--[---][---] OPAL_INT8 count 1 disp 0xd0 (208) extent 8 (size 8)
--C---P-D--[---][---] OPAL_UINT8 count 1 disp 0xe0 (224) extent 8 (size 8)
-------G---[---][---] OPAL_END_LOOP prev 21 elements first elem displacement 0 size of data 104
Optimized description
-cC---P-DB-[---][---] OPAL_INT4 count 1 disp 0x0 (0) extent 4 (size 4)
-cC---P-DB-[---][---] OPAL_INT2 count 1 disp 0x8 (8) extent 2 (size 2)
-cC---P-DB-[---][---] OPAL_INT8 count 1 disp 0x10 (16) extent 8 (size 8)
-cC---P-DB-[---][---] OPAL_UINT2 count 1 disp 0x20 (32) extent 2 (size 2)
-cC---P-DB-[---][---] OPAL_UINT4 count 1 disp 0x24 (36) extent 4 (size 4)
-cC---P-DB-[---][---] OPAL_UINT8 count 1 disp 0x30 (48) extent 8 (size 8)
-cC---P-DB-[---][---] OPAL_FLOAT4 count 1 disp 0x40 (64) extent 4 (size 4)
-cC---P-DB-[---][---] OPAL_INT1 count 1 disp 0x48 (72) extent 1 (size 1)
-cC---P-DB-[---][---] OPAL_FLOAT8 count 1 disp 0x50 (80) extent 8 (size 8)
-cC---P-DB-[---][---] OPAL_UINT1 count 1 disp 0x60 (96) extent 1 (size 1)
-cC---P-DB-[---][---] OPAL_FLOAT16 count 1 disp 0x70 (112) extent 16 (size 16)
-cC---P-DB-[---][---] OPAL_INT1 count 1 disp 0x90 (144) extent 1 (size 1)
-cC---P-DB-[---][---] OPAL_UINT1 count 1 disp 0x92 (146) extent 1 (size 1)
-cC---P-DB-[---][---] OPAL_INT2 count 1 disp 0x94 (148) extent 2 (size 2)
-cC---P-DB-[---][---] OPAL_UINT2 count 1 disp 0x98 (152) extent 2 (size 2)
-cC---P-DB-[---][---] OPAL_INT4 count 1 disp 0x9c (156) extent 4 (size 4)
-cC---P-DB-[---][---] OPAL_UINT4 count 1 disp 0xa4 (164) extent 4 (size 4)
-cC---P-DB-[---][---] OPAL_INT8 count 1 disp 0xb0 (176) extent 8 (size 8)
-cC---P-DB-[---][---] OPAL_UINT8 count 1 disp 0xc0 (192) extent 8 (size 8)
-cC---P-DB-[---][---] OPAL_INT8 count 1 disp 0xd0 (208) extent 8 (size 8)
-cC---P-DB-[---][---] OPAL_UINT8 count 1 disp 0xe0 (224) extent 8 (size 8)
-------G---[---][---] OPAL_END_LOOP prev 21 elements first elem displacement 0 size of data 104

MPITEST error (1): libmpitest.c:1578 i=0, char value=-61, expected 0
MPITEST error (1): libmpitest.c:1608 i=0, int32_t value=117, expected 0
MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
MPITEST error (1): 4 errors in buffer (17,0,12) len 273 commsize 2 commtype -10 data_type 13 root 1
MPITEST info (0): Starting MPI_Isend_ator: All Isend TO Root test
MPITEST info (0): Node spec MPITEST_comm_sizes[6]=2 too large, using 1
MPITEST info (0): Node spec MPITEST_comm_sizes[22]=2 too large, using 1
MPITEST info (0): Node spec MPITEST_comm_sizes[32]=2 too large, using 1
MPITEST_results: MPI_Isend_ator: All Isend TO Root 1 tests FAILED (of 3744)
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[12296,1],1]
  Exit code: 1
--------------------------------------------------------------------------
[rvandevaart_at_drossetti-ivy1 src]$

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------