Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI devel] bug in opal_generic_simple_pack_function()
From: Nadia Derbey (Nadia.Derbey_at_[hidden])
Date: 2013-11-25 05:40:07


Hi,

I'm currently working on a bug occuring at the client site with openmpi
when calling MPI_Sendreceive() on datatypes built by the application.
I think I've found where the bug comes from (it is located in
opal_generic_simple_pack_function() - file
opal/datatype/opal_datatype_pack.c). But this code is so complicated
that I'm more than unsure of my fix. What I can say is that it fixes
things for me, but I need some advices from the datatypes specialists.

---------------

You will find in attachment the reproducer provided by the client, as
well as the resulting output.
datatypes.c : reproducer
to run the binary: salloc --exclusive -p B510 -N 1 -n 1 mpirun ./datatypes
trc_ko: traces got without the patch applied
trc_ok: traces got with the patch applied.

---------------

The proposed patch is the following: (Note that the very first change in
this patch was enough in my case, but I thought all the "source_base"
settings should follow this model.)

-------------------------
opal_generic_simple_pack_function: add the datatype lb when progressing
in the input buffer

diff -r cb23c2f07e1f opal/datatype/opal_datatype_pack.c
--- a/opal/datatype/opal_datatype_pack.c Sun Nov 24 17:06:51 2013
+0000
+++ b/opal/datatype/opal_datatype_pack.c Mon Nov 25 10:48:00 2013
+0100
@@ -301,7 +301,7 @@ opal_generic_simple_pack_function( opal_
                  PACK_PREDEFINED_DATATYPE( pConvertor, pElem, count_desc,
                                            source_base, destination,
iov_len_local );
                  if( 0 == count_desc ) { /* completed */
- source_base = pConvertor->pBaseBuf + pStack->disp;
+ source_base = pConvertor->pBaseBuf + pStack->disp +
pData->lb;
                      pos_desc++; /* advance to the next data */
                      UPDATE_INTERNAL_COUNTERS( description, pos_desc,
pElem, count_desc );
                      continue;
@@ -333,7 +333,7 @@ opal_generic_simple_pack_function( opal_
                          pStack->disp +=
description[pStack->index].loop.extent;
                      }
                  }
- source_base = pConvertor->pBaseBuf + pStack->disp;
+ source_base = pConvertor->pBaseBuf + pStack->disp +
pData->lb;
                  UPDATE_INTERNAL_COUNTERS( description, pos_desc,
pElem, count_desc );
                  DO_DEBUG( opal_output( 0, "pack new_loop count %d
stack_pos %d pos_desc %d disp %ld space %lu\n",
                                         (int)pStack->count,
pConvertor->stack_pos, pos_desc, (long)pStack->disp, (unsigned
long)iov_len_local ); );
@@ -354,7 +354,7 @@ opal_generic_simple_pack_function( opal_
                              pStack->disp + local_disp);
                  pos_desc++;
              update_loop_description: /* update the current state */
- source_base = pConvertor->pBaseBuf + pStack->disp;
+ source_base = pConvertor->pBaseBuf + pStack->disp +
pData->lb;
                  UPDATE_INTERNAL_COUNTERS( description, pos_desc,
pElem, count_desc );
                  DDT_DUMP_STACK( pConvertor->pStack,
pConvertor->stack_pos, pElem, "advance loop" );
                  continue;
@@ -374,7 +374,7 @@ opal_generic_simple_pack_function( opal_
      }
      /* I complete an element, next step I should go to the next one */
      PUSH_STACK( pStack, pConvertor->stack_pos, pos_desc,
OPAL_DATATYPE_INT8, count_desc,
- source_base - pStack->disp - pConvertor->pBaseBuf );
+ source_base - pStack->disp - pConvertor->pBaseBuf -
pData->lb );
      DO_DEBUG( opal_output( 0, "pack save stack stack_pos %d pos_desc
%d count_desc %d disp %ld\n",
                             pConvertor->stack_pos, pStack->index,
(int)pStack->count, (long)pStack->disp ); );
      return 0;

-------------------------------

Regards,
Nadia

-- 
Nadia Derbey
Bull, Architect of an Open World
http://www.bull.com





  • text/plain attachment: trc_ko

  • text/plain attachment: trc_ok