From: George Bosilca (bosilca_at_[hidden])
Date: 2006-07-08 13:47:05


I'm unable to replicate this one with the latest Open MPI trunk version.
As there is no difference between the trunk and the latest 1.1 version on
the datatype, I think the bug cannot be reproduced using the 1.1 either. I
compiled the test twice once using the indexed datatype and once without
and the output is exactly the same. I run it on my Apple G5 desktop as
well as on a cluster of AMD 64, over shared memory and TCP. Can you please
recheck that your error is comming from the type indexed please.


On Sat, 1 Jul 2006, Yvan Fournier wrote:

> Hello,
> I had encountered a bug in Open MPI 1.0.1 using indexed datatypes
> with MPI_Recv (which seems to be of the "off by one" sort), which
> was corrected in Open MPI 1.0.2.
> It seems to have resurfaced in Open MPI 1.1 (I encountered it using
> different data and did not recognize it immediately, but it seems
> it can reproduced using the same simplified test I had sent
> the first time, which I re-attach here just in case).
> Here is a summary of the case:
> ------------------
> Each processor reads a file ("data_p0" or "data_p1") giving a list of
> global element ids. Some elements (vertices from a partitionned mesh)
> may belong to both processors, so their id's may appear on both
> processors: we have 7178 global vertices, 3654 and 3688 of them being
> known by ranks 0 and 1 respectively.
> In this simplified version, we assign coordinates {x, y, z} to each
> vertex equal to it's global id number for rank 1, and the negative of
> that for rank 0 (assigning the same values to x, y, and z). After
> finishing the "ordered gather", rank 0 prints the global id and
> coordinates of each vertex.
> lines should print (for example) as:
> 6456 ; 6455.00000 6455.00000 6456.00000
> 6457 ; -6457.00000 -6457.00000 -6457.00000
> depending on whether a vertex belongs only to rank 0 (negative
> coordinates) or belongs to rank 1 (positive coordinates).
> With the OMPI 1.0.1 bug (observed on Suse Linux 10.0 with gcc 4.0 and on
> Debian sarge with gcc 3.4), we have for example for the last vertices:
> 7176 ; 7175.00000 7175.00000 7176.00000
> 7177 ; 7176.00000 7176.00000 7177.00000
> seeming to indicate an "off by one" type bug in datatype handling
> Not using an indexed datatype (i.e. not defining USE_INDEXED_DATATYPE
> in the gather_test.c file), the bug dissapears.
> ------------------
> Best regards,
> Yvan Fournier

"We must accept finite disappointment, but we must never lose infinite
                                   Martin Luther King