Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Alltoall with Vector Datatype
From: George Bosilca (bosilca_at_[hidden])
Date: 2014-05-08 15:19:33


The segfault indicates that you overwrite outside of the allocated memory (and conflicts with the ptmalloc library). I’m quite certain that you write outside the allocated array …

  George.

On May 8, 2014, at 15:16 , Spenser Gilliland <spenser_at_[hidden]> wrote:

> George & Mattheiu,
>
>> The Alltoall should only return when all data is sent and received on
>> the current rank, so there shouldn't be any race condition.
>
> Your right this is MPI not pthreads. That should never happen. Duh!
>
>> I think the issue is with the way you define the send and receive
>> buffer in the MPI_Alltoall. You have to keep in mind that the
>> all-to-all pattern will overwrite the entire data in the receive
>> buffer. Thus, starting from a relative displacement in the data (in
>> this case matrix[wrank*wrows]), begs for troubles, as you will write
>> outside the receive buffer.
>
> The submatrix corresponding to matrix[wrank*wrows][0] to
> matrix[(wrank+1)*wrows-1][:] is valid only on the wrank process. This
> is a block distribution of the rows like what MPI_Scatter would
> produce. As wrows is equal to N (matrix width/height) divided by
> wsize, the number of mpi_all_t blocks in each message is equal to
> wsize. Therefore, there should be no writing outside the bounds of
> the submatrix.
>
> On another note,
> I just ported the example to use dynamic memory and now I'm getting
> segfaults when I call MPI_Finalize(). Any idea what in the code could
> have caused this?
>
> It's paste binned here: https://gist.github.com/anonymous/a80e0679c3cbffb82e39
>
> The result is
>
> [sgillila_at_jarvis src]$ mpirun -npernode 2 transpose2 8
> N = 8
> Matrix =
> 0: 0 1 2 3 4 5 6 7
> 0: 8 9 10 11 12 13 14 15
> 0: 16 17 18 19 20 21 22 23
> 0: 24 25 26 27 28 29 30 31
> 1: 32 33 34 35 36 37 38 39
> 1: 40 41 42 43 44 45 46 47
> 1: 48 49 50 51 52 53 54 55
> 1: 56 57 58 59 60 61 62 63
> Matrix =
> 0: 0 8 16 24 32 40 48 56
> 0: 1 9 17 25 33 41 49 57
> 0: 2 10 18 26 34 42 50 58
> 0: 3 11 19 27 35 43 51 59
> 1: 4 12 20 28 36 44 52 60
> 1: 5 13 21 29 37 45 53 61
> 1: 6 14 22 30 38 46 54 62
> 1: 7 15 23 31 39 47 55 63
> [jarvis:09314] *** Process received signal ***
> [jarvis:09314] Signal: Segmentation fault (11)
> [jarvis:09314] Signal code: Address not mapped (1)
> [jarvis:09314] Failing at address: 0x21da228
> [jarvis:09314] [ 0] /lib64/libpthread.so.0() [0x371480f500]
> [jarvis:09314] [ 1]
> /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_int_free+0x75)
> [0x7f2e85452575]
> [jarvis:09314] [ 2]
> /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_free+0xd3)
> [0x7f2e85452bc3]
> [jarvis:09314] [ 3] transpose2(main+0x160) [0x4012a0]
> [jarvis:09314] [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3713c1ecdd]
> [jarvis:09314] [ 5] transpose2() [0x400d49]
> [jarvis:09314] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 9314 on node
> jarvis.cs.iit.edu exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
> --
> Spenser Gilliland
> Computer Engineer
> Doctoral Candidate
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users