Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Recv hangs
From: Eduardo Morras (nec556_at_[hidden])
Date: 2012-05-04 08:58:12


At 11:52 04/05/2012, you wrote:
>Hi all,
>
>I have a program that executes a communication loop similar to this one:
>
>1: for(int p1=0; p1<np; ++p1) {
>2: for(int p2=0; p2<np; ++p2) {
>3: if(me==p1) {
>4: if(sendSize(p2))
>MPI_Ssend(sendBuffer[p2],sendSize(p2),MPI_FLOAT,p2,0,myw);
>5: if(recvSize(p2))
>MPI_Recv(recvBuffer[p2],recvSize(p2),MPI_FLOAT,p2,0,myw,&status);
>6: } else if(yo==p2) {
>7: if(recvSize(p1))
>MPI_Recv(recvBuffer[p1],recvSize(p1),MPI_FLOAT,p2,0,myw,&status);
>8: if(sendSize(p1))
>MPI_Ssend(sendBuffer[p1],sendSize(p1),MPI_FLOAT,p2,0,myw);
>9: }
>10: MPI_Barrier(myw);
>11: }
>12: }
>
>The program is an iterative process that makes some calculations,
>communicates and then continues with the next iteration. The problem
>is that after making 30 successful iterations the program hangs.
>With padb I have seen that one of the processors waits at line 5 for
>the reception of data that was already sent and the rest of the
>processors are waiting at the barrier in line 10. The size of the
>messages and buffers is the same for all the iterations.
>
>My real program makes use of asynchronous communications for obvious
>performance reasons and it worked without problems when the case to
>solve was smaller (lower number of processors and memory), but I
>found that for this case the program hanged and that is why a
>changed the communication routine using synchronous communications
>to see where is the problem. Now I know where the program hangs, but
>I don't understand what I am doing wrong.
>
>Any suggestions?

All messages has p2 as destination. So, p1 is waiting for a message
that hasn't been sended for him. He shouldn't be waiting any
messages. Don't know the logic of your program, so can't tell more
suggestions or clues.