Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Crash when using MPI_REAL8
From: Sylvain Jeaugey (sylvain.jeaugey_at_[hidden])
Date: 2009-12-04 14:27:22


There is definetly something wrong in types.

OMPI_DATATYPE_MAX_PREDEFINED is set to 45, while there are 55 predefined
types. When accessing ompi_op_ddt_map[ddt->id] with MPI_REAL8
(ddt->id=54), we're reading the ompi_mpi_op_bxor struct.

Depending on various things (padding, uninitialized memory), we may get 0
and not crash. If you're not lucky, you get a random value and crash soon
afterwards.

So, I extended things a bit and it seems to fix my problem. I'm not sure
all types are now handled, I just added some that are not defined.

Sylvain

diff -r e82b914000bd -r 1a40aee2925c ompi/datatype/ompi_datatype.h
--- a/ompi/datatype/ompi_datatype.h Thu Dec 03 04:46:31 2009 +0000
+++ b/ompi/datatype/ompi_datatype.h Fri Dec 04 19:59:26 2009 +0100
@@ -57,7 +57,7 @@
  #define OMPI_DATATYPE_FLAG_DATA_FORTRAN 0xC000
  #define OMPI_DATATYPE_FLAG_DATA_LANGUAGE 0xC000

-#define OMPI_DATATYPE_MAX_PREDEFINED 45
+#define OMPI_DATATYPE_MAX_PREDEFINED 55

  #if OMPI_DATATYPE_MAX_PREDEFINED > OPAL_DATATYPE_MAX_SUPPORTED
  #error Need to increase the number of supported dataypes by OPAL (value OPAL_DATATYPE_MAX_SUPPORTED).
diff -r e82b914000bd -r 1a40aee2925c ompi/op/op.c
--- a/ompi/op/op.c Thu Dec 03 04:46:31 2009 +0000
+++ b/ompi/op/op.c Fri Dec 04 19:59:26 2009 +0100
@@ -137,6 +137,14 @@
      ompi_op_ddt_map[OMPI_DATATYPE_MPI_2INTEGER] = OMPI_OP_BASE_TYPE_2INTEGER;
      ompi_op_ddt_map[OMPI_DATATYPE_MPI_LONG_DOUBLE_INT] = OMPI_OP_BASE_TYPE_LONG_DOUBLE_INT;
      ompi_op_ddt_map[OMPI_DATATYPE_MPI_WCHAR] = OMPI_OP_BASE_TYPE_WCHAR;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_INTEGER2] = OMPI_OP_BASE_TYPE_INTEGER2;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_INTEGER4] = OMPI_OP_BASE_TYPE_INTEGER4;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_INTEGER8] = OMPI_OP_BASE_TYPE_INTEGER8;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_INTEGER16] = OMPI_OP_BASE_TYPE_INTEGER16;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_REAL2] = OMPI_OP_BASE_TYPE_REAL2;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_REAL4] = OMPI_OP_BASE_TYPE_REAL4;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_REAL8] = OMPI_OP_BASE_TYPE_REAL8;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_REAL16] = OMPI_OP_BASE_TYPE_REAL16;

      /* Create the intrinsic ops */

diff -r e82b914000bd -r 1a40aee2925c opal/datatype/opal_datatype.h
--- a/opal/datatype/opal_datatype.h Thu Dec 03 04:46:31 2009 +0000
+++ b/opal/datatype/opal_datatype.h Fri Dec 04 19:59:26 2009 +0100
@@ -56,7 +56,7 @@
   *
   * XXX TODO Adapt to whatever the OMPI-layer needs
   */
-#define OPAL_DATATYPE_MAX_SUPPORTED 46
+#define OPAL_DATATYPE_MAX_SUPPORTED 56

  /* flags for the datatypes. */

On Fri, 4 Dec 2009, Sylvain Jeaugey wrote:

> For the record, and to try to explain why all MTT tests may have missed this
> "bug", configuring without --enable-debug makes the bug disappear.
>
> Still trying to figure out why.
>
> Sylvain
>
> On Thu, 3 Dec 2009, Sylvain Jeaugey wrote:
>
>> Hi list,
>>
>> I hope this time I won't be the only one to suffer this bug :)
>>
>> It is very simple indeed, just perform an allreduce with MPI_REAL8
>> (fortran) and you should get a crash in ompi/op/op.h:411. Tested with trunk
>> and v1.5, working fine on v1.3.
>>
>> From what I understand, in the trunk, MPI_REAL8 has now a fixed id (in
>> ompi/datatype/ompi_datatype_internal.h), but operations do not have an
>> index going as far as 54 (0x36), leading to a crash when looking for
>> op->o_func.intrinsic.fns[ompi_op_ddt_map[ddt->id]] in ompi_op_is_valid()
>> (or, if I disable mpi_param_check, in ompi_op_reduce()).
>>
>> Here is a reproducer, just in case :
>> program main
>> use mpi
>> integer ierr
>> real(8) myreal, realsum
>> call MPI_INIT(ierr)
>> call MPI_ALLREDUCE(myreal, realsum, 1, MPI_REAL8, MPI_SUM, MPI_COMM_WORLD,
>> ierr)
>> call MPI_FINALIZE(ierr)
>> stop
>> end
>>
>> Has anyone an idea on how to fix this ? Or am I doing something wrong ?
>>
>> Thanks for any help,
>> Sylvain
>>
>>
>>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>