Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] bug in opal_generic_simple_pack_function()
From: Nadia Derbey (Nadia.Derbey_at_[hidden])
Date: 2013-11-25 05:40:07


Hi,

I'm currently working on a bug occuring at the client site with openmpi
when calling MPI_Sendreceive() on datatypes built by the application.
I think I've found where the bug comes from (it is located in
opal_generic_simple_pack_function() - file
opal/datatype/opal_datatype_pack.c). But this code is so complicated
that I'm more than unsure of my fix. What I can say is that it fixes
things for me, but I need some advices from the datatypes specialists.

---------------

You will find in attachment the reproducer provided by the client, as
well as the resulting output.
datatypes.c : reproducer
to run the binary: salloc --exclusive -p B510 -N 1 -n 1 mpirun ./datatypes
trc_ko: traces got without the patch applied
trc_ok: traces got with the patch applied.

---------------

The proposed patch is the following: (Note that the very first change in
this patch was enough in my case, but I thought all the "source_base"
settings should follow this model.)

-------------------------
opal_generic_simple_pack_function: add the datatype lb when progressing
in the input buffer

diff -r cb23c2f07e1f opal/datatype/opal_datatype_pack.c
--- a/opal/datatype/opal_datatype_pack.c Sun Nov 24 17:06:51 2013
+0000
+++ b/opal/datatype/opal_datatype_pack.c Mon Nov 25 10:48:00 2013
+0100
@@ -301,7 +301,7 @@ opal_generic_simple_pack_function( opal_
                  PACK_PREDEFINED_DATATYPE( pConvertor, pElem, count_desc,
                                            source_base, destination,
iov_len_local );
                  if( 0 == count_desc ) { /* completed */
- source_base = pConvertor->pBaseBuf + pStack->disp;
+ source_base = pConvertor->pBaseBuf + pStack->disp +
pData->lb;
                      pos_desc++; /* advance to the next data */
                      UPDATE_INTERNAL_COUNTERS( description, pos_desc,
pElem, count_desc );
                      continue;
@@ -333,7 +333,7 @@ opal_generic_simple_pack_function( opal_
                          pStack->disp +=
description[pStack->index].loop.extent;
                      }
                  }
- source_base = pConvertor->pBaseBuf + pStack->disp;
+ source_base = pConvertor->pBaseBuf + pStack->disp +
pData->lb;
                  UPDATE_INTERNAL_COUNTERS( description, pos_desc,
pElem, count_desc );
                  DO_DEBUG( opal_output( 0, "pack new_loop count %d
stack_pos %d pos_desc %d disp %ld space %lu\n",
                                         (int)pStack->count,
pConvertor->stack_pos, pos_desc, (long)pStack->disp, (unsigned
long)iov_len_local ); );
@@ -354,7 +354,7 @@ opal_generic_simple_pack_function( opal_
                              pStack->disp + local_disp);
                  pos_desc++;
              update_loop_description: /* update the current state */
- source_base = pConvertor->pBaseBuf + pStack->disp;
+ source_base = pConvertor->pBaseBuf + pStack->disp +
pData->lb;
                  UPDATE_INTERNAL_COUNTERS( description, pos_desc,
pElem, count_desc );
                  DDT_DUMP_STACK( pConvertor->pStack,
pConvertor->stack_pos, pElem, "advance loop" );
                  continue;
@@ -374,7 +374,7 @@ opal_generic_simple_pack_function( opal_
      }
      /* I complete an element, next step I should go to the next one */
      PUSH_STACK( pStack, pConvertor->stack_pos, pos_desc,
OPAL_DATATYPE_INT8, count_desc,
- source_base - pStack->disp - pConvertor->pBaseBuf );
+ source_base - pStack->disp - pConvertor->pBaseBuf -
pData->lb );
      DO_DEBUG( opal_output( 0, "pack save stack stack_pos %d pos_desc
%d count_desc %d disp %ld\n",
                             pConvertor->stack_pos, pStack->index,
(int)pStack->count, (long)pStack->disp ); );
      return 0;

-------------------------------

Regards,
Nadia

-- 
Nadia Derbey
Bull, Architect of an Open World
http://www.bull.com





  • text/plain attachment: trc_ko

  • text/plain attachment: trc_ok