> Here is basically what is happening. On the top left, I depicted the datatype resulting from the vector type. The two arrows point to the lower bound and upper bound (thus the extent) of the datatype. On the top right, the resized datatype, where the ub is now moved 2 elements after the lb, allowing for a nice interleaving of the data. Then the next line is the unrolled datatype representation, flatten to a 1D. Again it contains in red the data touched by the defined memory layout, as well as the extent (lb and ub).
> Now, letâs move on the MPI_Alltoall call. The array is the one without colors, and then I put the datatype starting from the position you specified in the alltoall. As you can see as soon as you donât start at the origin of the allocated memory, you end-up writing outside of your data. This happens deep inside the MPI_Alltoall call (no validation at the MPI level).
Why are the last two elements in the 1D view present? If that's the
case I would have to define a new MPI Type for each set of columns
within a matrix. Why would it be defined in this manner? Also, why
is the extent of the initial vector type equal to 12 when it is
actually accessing 16 elements (for the 4x4 example).
So, is this a bug in Alltoall or openmpi?
I believe it is all to all causing the bug and not vector because the following
MPI_Aint lb, extent, true_lb, true_extent;
MPI_Type_get_extent(mpi_all_t, &lb, &extent);
MPI_Type_get_true_extent(mpi_all_t, &true_lb, &true_extent);
printf("mpi_all_t - lb = %d, extent = %d, true_lb = %d, true_extent =
%d\n", lb, extent, true_lb, true_extent);
mpi_all_t - lb = 0, extent = 16, true_lb = 0, true_extent = 240
Which means that the size is correct (using 4 byte floats with 2
processor on an 8x8 array this would be the 30th element).
There's a similar drawing to what you made attached that's more
focused on the specific instance in this code. Hopefully, this clears
up the algorithm a bit.