Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Misuse or bug with nested types?
From: Eric Chamberland (Eric.Chamberland_at_[hidden])
Date: 2013-04-22 10:07:42


Hi,

I have a problem receiving a vector of a MPI_datatype constructed via
MPI_Type_create_struct.

It looks like MPI_Send or MPI_Recv doesn't works as expected: some parts
of a nested struct in the received buffer are not filled at all!.

I tested the code under mpich 3.0.3 and it worked perfectly!

So I simplified everything (but still have ~400 lines of code) and put
all of it in self-contained example attached to this mail.

Briefly, we construct different MPI_datatype and nests them into a final
type which is a:
{MPI_LONG,{{MPI_DOUBLE,MPI_LONG,MPI_CHAR}*2}
(which represents a std::pair<long int, 2DVerticeInfo>)

The output for different openmpi versions is surprising:

With openmpi 1.6.3 and 1.6.4:

   Rank 0 send this:
   i: 0 => {{0},{{0.5,3,%},{0.25,7,5}}}
   i: 1 => {{1},{{0.5,3,%},{0.25,7,5}}}
   i: 2 => {{2},{{0.5,3,%},{0.25,7,5}}}
   i: 3 => {{3},{{0.5,3,%},{0.25,7,5}}}
   i: 4 => {{4},{{0.5,3,%},{0.25,7,5}}}
   i: 5 => {{5},{{0.5,3,%},{0.25,7,5}}}
   Rank 1 received this:
   i: 0 => {{0},{{0.5,3,%},{-888.8,-999,$}}} *** ERROR ****
   i: 1 => {{1},{{0.5,3,%},{-888.8,-999,$}}} *** ERROR ****
   i: 2 => {{2},{{0.5,3,%},{-888.8,-999,$}}} *** ERROR ****
   i: 3 => {{3},{{0.5,3,%},{-888.8,-999,$}}} *** ERROR ****
   i: 4 => {{4},{{0.5,3,%},{-888.8,-999,$}}} *** ERROR ****
   i: 5 => {{5},{{0.5,3,%},{-888.8,-999,$}}} *** ERROR ****

With openmpi 1.7.0:
   Rank 0 send this:
   i: 0 => {{0},{{0.5,3,%},{0.25,7,5}}}
   i: 1 => {{1},{{0.5,3,%},{0.25,7,5}}}
   i: 2 => {{2},{{0.5,3,%},{0.25,7,5}}}
   i: 3 => {{3},{{0.5,3,%},{0.25,7,5}}}
   i: 4 => {{4},{{0.5,3,%},{0.25,7,5}}}
   i: 5 => {{5},{{0.5,3,%},{0.25,7,5}}}
   Rank 1 received this:
   i: 0 => {{0},{{0.5,3,%},{-888.8,-999,$}}} *** ERROR ****
   i: 1 => {{1},{{0.5,3,%},{-888.8,-999,$}}} *** ERROR ****
   i: 2 => {{2},{{0.5,3,%},{-888.8,-999,$}}} *** ERROR ****
   i: 3 => {{3},{{0.5,3,%},{-888.8,-999,$}}} *** ERROR ****
   i: 4 => {{4},{{0.5,3,%},{-888.8,-999,$}}} *** ERROR ****
   i: 5 => {{5},{{0.5,3,%},{-888.8,-999,$}}} *** ERROR ****

with mpich-3.0.3:
   Rank 0 send this:
   i: 0 => {{0},{{0.5,3,%},{0.25,7,5}}}
   i: 1 => {{1},{{0.5,3,%},{0.25,7,5}}}
   i: 2 => {{2},{{0.5,3,%},{0.25,7,5}}}
   i: 3 => {{3},{{0.5,3,%},{0.25,7,5}}}
   i: 4 => {{4},{{0.5,3,%},{0.25,7,5}}}
   i: 5 => {{5},{{0.5,3,%},{0.25,7,5}}}
   Rank 1 received this:
   i: 0 => {{0},{{0.5,3,%},{0.25,7,5}}} OK
   i: 1 => {{1},{{0.5,3,%},{0.25,7,5}}} OK
   i: 2 => {{2},{{0.5,3,%},{0.25,7,5}}} OK
   i: 3 => {{3},{{0.5,3,%},{0.25,7,5}}} OK
   i: 4 => {{4},{{0.5,3,%},{0.25,7,5}}} OK
   i: 5 => {{5},{{0.5,3,%},{0.25,7,5}}} OK

I also "valgrinded" the code under mpich:
   mpirun -n 2 valgrind ./sbugnt
==25148== Memcheck, a memory error detector
==25148== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==25148== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==25148== Command: ./sbugnt
==25148==
==25147== Memcheck, a memory error detector
==25147== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==25147== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==25147== Command: ./sbugnt
==25147==
   Rank 0 send this:
   i: 0 => {{0},{{0.5,3,%},{0.25,7,5}}}
   i: 1 => {{1},{{0.5,3,%},{0.25,7,5}}}
   i: 2 => {{2},{{0.5,3,%},{0.25,7,5}}}
   i: 3 => {{3},{{0.5,3,%},{0.25,7,5}}}
   i: 4 => {{4},{{0.5,3,%},{0.25,7,5}}}
   i: 5 => {{5},{{0.5,3,%},{0.25,7,5}}}
   Rank 1 received this:
   i: 0 => {{0},{{0.5,3,%},{0.25,7,5}}} OK
   i: 1 => {{1},{{0.5,3,%},{0.25,7,5}}} OK
   i: 2 => {{2},{{0.5,3,%},{0.25,7,5}}} OK
   i: 3 => {{3},{{0.5,3,%},{0.25,7,5}}} OK
   i: 4 => {{4},{{0.5,3,%},{0.25,7,5}}} OK
   i: 5 => {{5},{{0.5,3,%},{0.25,7,5}}} OK
==25147==
==25147== HEAP SUMMARY:
==25147== in use at exit: 0 bytes in 0 blocks
==25147== total heap usage: 215 allocs, 215 frees, 26,067 bytes allocated
==25147==
==25147== All heap blocks were freed -- no leaks are possible
==25147==
==25147== For counts of detected and suppressed errors, rerun with: -v
==25147== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==25148==
==25148== HEAP SUMMARY:
==25148== in use at exit: 0 bytes in 0 blocks
==25148== total heap usage: 213 allocs, 213 frees, 26,019 bytes allocated
==25148==
==25148== All heap blocks were freed -- no leaks are possible
==25148==
==25148== For counts of detected and suppressed errors, rerun with: -v
==25148== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Did we misused something?

Thanks for your help!

Eric