Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Alltoall with Vector Datatype
From: Spenser Gilliland (spenser_at_[hidden])
Date: 2014-05-08 15:16:17


George & Mattheiu,

> The Alltoall should only return when all data is sent and received on
> the current rank, so there shouldn't be any race condition.

Your right this is MPI not pthreads. That should never happen. Duh!

> I think the issue is with the way you define the send and receive
> buffer in the MPI_Alltoall. You have to keep in mind that the
> all-to-all pattern will overwrite the entire data in the receive
> buffer. Thus, starting from a relative displacement in the data (in
> this case matrix[wrank*wrows]), begs for troubles, as you will write
> outside the receive buffer.

The submatrix corresponding to matrix[wrank*wrows][0] to
matrix[(wrank+1)*wrows-1][:] is valid only on the wrank process. This
is a block distribution of the rows like what MPI_Scatter would
produce. As wrows is equal to N (matrix width/height) divided by
wsize, the number of mpi_all_t blocks in each message is equal to
wsize. Therefore, there should be no writing outside the bounds of
the submatrix.

On another note,
I just ported the example to use dynamic memory and now I'm getting
segfaults when I call MPI_Finalize(). Any idea what in the code could
have caused this?

It's paste binned here: https://gist.github.com/anonymous/a80e0679c3cbffb82e39

The result is

[sgillila_at_jarvis src]$ mpirun -npernode 2 transpose2 8
N = 8
Matrix =
 0: 0 1 2 3 4 5 6 7
 0: 8 9 10 11 12 13 14 15
 0: 16 17 18 19 20 21 22 23
 0: 24 25 26 27 28 29 30 31
 1: 32 33 34 35 36 37 38 39
 1: 40 41 42 43 44 45 46 47
 1: 48 49 50 51 52 53 54 55
 1: 56 57 58 59 60 61 62 63
Matrix =
 0: 0 8 16 24 32 40 48 56
 0: 1 9 17 25 33 41 49 57
 0: 2 10 18 26 34 42 50 58
 0: 3 11 19 27 35 43 51 59
 1: 4 12 20 28 36 44 52 60
 1: 5 13 21 29 37 45 53 61
 1: 6 14 22 30 38 46 54 62
 1: 7 15 23 31 39 47 55 63
[jarvis:09314] *** Process received signal ***
[jarvis:09314] Signal: Segmentation fault (11)
[jarvis:09314] Signal code: Address not mapped (1)
[jarvis:09314] Failing at address: 0x21da228
[jarvis:09314] [ 0] /lib64/libpthread.so.0() [0x371480f500]
[jarvis:09314] [ 1]
/opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_int_free+0x75)
[0x7f2e85452575]
[jarvis:09314] [ 2]
/opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_free+0xd3)
[0x7f2e85452bc3]
[jarvis:09314] [ 3] transpose2(main+0x160) [0x4012a0]
[jarvis:09314] [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3713c1ecdd]
[jarvis:09314] [ 5] transpose2() [0x400d49]
[jarvis:09314] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 9314 on node
jarvis.cs.iit.edu exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

-- 
Spenser Gilliland
Computer Engineer
Doctoral Candidate