Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Yvan Fournier (yvan.fournier_at_[hidden])
Date: 2006-02-10 17:06:43


Hello,

I seem to have encountered a bug in Open MPI 1.0 using indexed datatypes
with MPI_Recv (which seems to be of the "off by one" sort). I have
joined a test case, which is briefly explained below (as well as in the
source file). This case should run on two processes. I observed the bug
on 2 different Linux systems (single processor Centrino under Suse 10.0
with gcc 4.0.2, dual-processor Xeon under Debian Sarge with gcc 3.4)
with Open MPI 1.0.1, and do not observe it using LAM 7.1.1 or MPICH2.

Here is a summary of the case:

------------------

Each processor reads a file ("data_p0" or "data_p1") giving a list of
global element ids. Some elements (vertices from a partitionned mesh)
may belong to both processors, so their id's may appear on both
processors: we have 7178 global vertices, 3654 and 3688 of them being
known by ranks 0 and 1 respectively.

In this simplified version, we assign coordinates {x, y, z} to each
vertex equal to it's global id number for rank 1, and the negative of
that for rank 0 (assigning the same values to x, y, and z). After
finishing the "ordered gather", rank 0 prints the global id and
coordinates of each vertex.

lines should print (for example) as:
  6456 ; 6455.00000 6455.00000 6456.00000
  6457 ; -6457.00000 -6457.00000 -6457.00000
depending on whether a vertex belongs only to rank 0 (negative
coordinates) or belongs to rank 1 (positive coordinates).

With the OMPI 1.0.1 bug (observed on Suse Linux 10.0 with gcc 4.0 and on
Debian sarge with gcc 3.4), we have for example for the last vertices:
  7176 ; 7175.00000 7175.00000 7176.00000
  7177 ; 7176.00000 7176.00000 7177.00000
seeming to indicate an "off by one" type bug in datatype handling

Not using an indexed datatype (i.e. not defining USE_INDEXED_DATATYPE
in the gather_test.c file), the bug dissapears. Using the indexed
datatype with LAM MPI 7.1.1 or MPICH2, we do not reproduce the bug
either, so it does seem to be an Open MPI issue.

------------------

Best regards,

        Yvan Fournier