Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Problem when using struct types at specific offsets
From: George Bosilca (bosilca_at_[hidden])
Date: 2013-04-09 19:06:07


Thomas,

Thanks for the detailed bug report and the test case. I successfully identified the culprit, and the issue is now fixed (commit r28319).

  Regards,
    George.

PS: During the debugging process I sketched the datatype representation to help myself understand the issue. I attached the figure here for the delight of whoever might be interested. It contains the 4 datatypes created in main, and the two datatypes created on the second invocation of the do_test function.

On Apr 8, 2013, at 16:08 , Thomas Jahns <jahns_at_[hidden]> wrote:

> Hello,
>
> a colleague of mine has investigated a difficult problem we traced to OpenMPI
> giving incorrectly delivered data on some struct datatypes which use specific
> offsets (on the stack in our case but the problem can be reproduced when using
> specifically chosen slices of an array). Our library is used to aggregate
> several MPI communications in a generic and transparent manner and therefore we
> need to be able to handle any combination of properly aligned offsets and
> component types.
>
> The attached example program contains the necessary steps to reproduce the problem:
>
> 1. create the struct types in question
> 2. send/recv the data
> 3. compare to reference (said comparison works on several MPICH2 versions)
>
> The code prints than any array indices/values not matching the reference.
>
> Our platform is linux x86_64 with Debian squeeze, the tested versions of OpenMPI
> are the 1.4.2 version supplied with squeeze and 1.6.4 compiled ourselves. For
> 1.4.2 I also did a quick test in a i386 chroot and the code fails there too. gcc
> 4.6.1 was used for the x86_64 cases and gcc 4.3.5 for the i386 chroot.
>
> Sorry if the test is not of minimal size, but we were happy once he got this
> down from several 10000 lines Fortran+C and even that took more than a day once
> we understood the problem was unrelated to the Fortran program it originally
> occurred in.
>
> When running the program with OpenMPI:
>
> $ mpicc -std=gnu99 ./mpi_test.c && ./a.out
> first tests:
> second tests:
> results_2[6] = 8
> ref_results_2[6] = 12
> results_2[7] = 9
> ref_results_2[7] = 13
>
> MPICH gives the expected result:
> $ /sw/squeeze-x64/mpi/mpich2-1.4.1p1-gccsys/bin/mpicc -std=gnu99 ./mpi_test.c &&
> ./a.out
> first tests:
> second tests:
>
> Regards, Thomas
> --
> Thomas Jahns
> DKRZ GmbH, Department: Application software
>
> Deutsches Klimarechenzentrum
> Bundesstraße 45a
> D-20146 Hamburg
>
> Phone: +49-40-460094-151
> Fax: +49-40-460094-270
> Email: Thomas Jahns <jahns_at_[hidden]>
> <mpi_test.c>_______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



PastedGraphic-2.png