Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Crash when using MPI_REAL8
From: Sylvain Jeaugey (sylvain.jeaugey_at_[hidden])
Date: 2009-12-04 14:27:22


There is definetly something wrong in types.

OMPI_DATATYPE_MAX_PREDEFINED is set to 45, while there are 55 predefined
types. When accessing ompi_op_ddt_map[ddt->id] with MPI_REAL8
(ddt->id=54), we're reading the ompi_mpi_op_bxor struct.

Depending on various things (padding, uninitialized memory), we may get 0
and not crash. If you're not lucky, you get a random value and crash soon
afterwards.

So, I extended things a bit and it seems to fix my problem. I'm not sure
all types are now handled, I just added some that are not defined.

Sylvain

diff -r e82b914000bd -r 1a40aee2925c ompi/datatype/ompi_datatype.h
--- a/ompi/datatype/ompi_datatype.h Thu Dec 03 04:46:31 2009 +0000
+++ b/ompi/datatype/ompi_datatype.h Fri Dec 04 19:59:26 2009 +0100
@@ -57,7 +57,7 @@
  #define OMPI_DATATYPE_FLAG_DATA_FORTRAN 0xC000
  #define OMPI_DATATYPE_FLAG_DATA_LANGUAGE 0xC000

-#define OMPI_DATATYPE_MAX_PREDEFINED 45
+#define OMPI_DATATYPE_MAX_PREDEFINED 55

  #if OMPI_DATATYPE_MAX_PREDEFINED > OPAL_DATATYPE_MAX_SUPPORTED
  #error Need to increase the number of supported dataypes by OPAL (value OPAL_DATATYPE_MAX_SUPPORTED).
diff -r e82b914000bd -r 1a40aee2925c ompi/op/op.c
--- a/ompi/op/op.c Thu Dec 03 04:46:31 2009 +0000
+++ b/ompi/op/op.c Fri Dec 04 19:59:26 2009 +0100
@@ -137,6 +137,14 @@
      ompi_op_ddt_map[OMPI_DATATYPE_MPI_2INTEGER] = OMPI_OP_BASE_TYPE_2INTEGER;
      ompi_op_ddt_map[OMPI_DATATYPE_MPI_LONG_DOUBLE_INT] = OMPI_OP_BASE_TYPE_LONG_DOUBLE_INT;
      ompi_op_ddt_map[OMPI_DATATYPE_MPI_WCHAR] = OMPI_OP_BASE_TYPE_WCHAR;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_INTEGER2] = OMPI_OP_BASE_TYPE_INTEGER2;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_INTEGER4] = OMPI_OP_BASE_TYPE_INTEGER4;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_INTEGER8] = OMPI_OP_BASE_TYPE_INTEGER8;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_INTEGER16] = OMPI_OP_BASE_TYPE_INTEGER16;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_REAL2] = OMPI_OP_BASE_TYPE_REAL2;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_REAL4] = OMPI_OP_BASE_TYPE_REAL4;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_REAL8] = OMPI_OP_BASE_TYPE_REAL8;
+ ompi_op_ddt_map[OMPI_DATATYPE_MPI_REAL16] = OMPI_OP_BASE_TYPE_REAL16;

      /* Create the intrinsic ops */

diff -r e82b914000bd -r 1a40aee2925c opal/datatype/opal_datatype.h
--- a/opal/datatype/opal_datatype.h Thu Dec 03 04:46:31 2009 +0000
+++ b/opal/datatype/opal_datatype.h Fri Dec 04 19:59:26 2009 +0100
@@ -56,7 +56,7 @@
   *
   * XXX TODO Adapt to whatever the OMPI-layer needs
   */
-#define OPAL_DATATYPE_MAX_SUPPORTED 46
+#define OPAL_DATATYPE_MAX_SUPPORTED 56

  /* flags for the datatypes. */

On Fri, 4 Dec 2009, Sylvain Jeaugey wrote:

> For the record, and to try to explain why all MTT tests may have missed this
> "bug", configuring without --enable-debug makes the bug disappear.
>
> Still trying to figure out why.
>
> Sylvain
>
> On Thu, 3 Dec 2009, Sylvain Jeaugey wrote:
>
>> Hi list,
>>
>> I hope this time I won't be the only one to suffer this bug :)
>>
>> It is very simple indeed, just perform an allreduce with MPI_REAL8
>> (fortran) and you should get a crash in ompi/op/op.h:411. Tested with trunk
>> and v1.5, working fine on v1.3.
>>
>> From what I understand, in the trunk, MPI_REAL8 has now a fixed id (in
>> ompi/datatype/ompi_datatype_internal.h), but operations do not have an
>> index going as far as 54 (0x36), leading to a crash when looking for
>> op->o_func.intrinsic.fns[ompi_op_ddt_map[ddt->id]] in ompi_op_is_valid()
>> (or, if I disable mpi_param_check, in ompi_op_reduce()).
>>
>> Here is a reproducer, just in case :
>> program main
>> use mpi
>> integer ierr
>> real(8) myreal, realsum
>> call MPI_INIT(ierr)
>> call MPI_ALLREDUCE(myreal, realsum, 1, MPI_REAL8, MPI_SUM, MPI_COMM_WORLD,
>> ierr)
>> call MPI_FINALIZE(ierr)
>> stop
>> end
>>
>> Has anyone an idea on how to fix this ? Or am I doing something wrong ?
>>
>> Thanks for any help,
>> Sylvain
>>
>>
>>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>