Thomas,

Thanks for the detailed bug report and the test case. I successfully identified the culprit, and the issue is now fixed (commit r28319).

  Regards,
    George.

PS: During the debugging process I sketched the datatype representation to help myself understand the issue. I attached the figure here for the delight of whoever might be interested. It contains the 4 datatypes created in main, and the two datatypes created on the second invocation of the do_test function.



On Apr 8, 2013, at 16:08 , Thomas Jahns <jahns@dkrz.de> wrote:

Hello,

a colleague of mine has investigated a difficult problem we traced to OpenMPI
giving incorrectly delivered data on some struct datatypes which use specific
offsets (on the stack in our case but the problem can be reproduced when using
specifically chosen slices of an array). Our library is used to aggregate
several MPI communications in a generic and transparent manner and therefore we
need to be able to handle any combination of properly aligned offsets and
component types.

The attached example program contains the necessary steps to reproduce the problem:

1. create the struct types in question
2. send/recv the data
3. compare to reference (said comparison works on several MPICH2 versions)

The code prints than any array indices/values not matching the reference.

Our platform is linux x86_64 with Debian squeeze, the tested versions of OpenMPI
are the 1.4.2 version supplied with squeeze and 1.6.4 compiled ourselves. For
1.4.2 I also did a quick test in a i386 chroot and the code fails there too. gcc
4.6.1 was used for the x86_64 cases and gcc 4.3.5 for the i386 chroot.

Sorry if the test is not of minimal size, but we were happy once he got this
down from several 10000 lines Fortran+C and even that took more than a day once
we understood the problem was unrelated to the Fortran program it originally
occurred in.

When running the program with OpenMPI:

$ mpicc -std=gnu99 ./mpi_test.c && ./a.out
first tests:
second tests:
results_2[6]     = 8
ref_results_2[6] = 12
results_2[7]     = 9
ref_results_2[7] = 13

MPICH gives the expected result:
$ /sw/squeeze-x64/mpi/mpich2-1.4.1p1-gccsys/bin/mpicc -std=gnu99 ./mpi_test.c &&
./a.out
first tests:
second tests:

Regards, Thomas
--
Thomas Jahns
DKRZ GmbH, Department: Application software

Deutsches Klimarechenzentrum
Bundesstraße 45a
D-20146 Hamburg

Phone: +49-40-460094-151
Fax: +49-40-460094-270
Email: Thomas Jahns <jahns@dkrz.de>
<mpi_test.c>_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel