Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] any deadlocks in this sets of MPI_send and MPI_recv ?
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-09-14 04:19:13


At first glance, your code doesn't look problematic. First thing I'd check is ensure that QRECS is large enough to hold the incoming data (i.e., that you aren't overwriting the buffer and causing memory corruption, which can cause weird/unexplained faults like this).

Also, you might well be able to accomplish the same communication pattern with MPI_GATHER (or MPI_GATHERV, if each rank is sending a different amount of information).

On Sep 14, 2013, at 12:27 AM, Huangwei <hz283_at_[hidden]>
 wrote:

> The code I would like to post is like this:
>
> if(myrank .ne. 0) then
> itag = myrank
> call mpi_send(Q.............., 0, itag, .................)
> else
> do i=1, N-1
> itag = i
> call mpi_recv(QRECS......., i, itag, .................)
> enddo
>
> endif
>
> call mpi_bcast(YVAR............., 0, ..............)
>
> best regards,
> Huangwei
>
>
>
>
>
>
> On 13 September 2013 23:25, Huangwei <hz283_at_[hidden]> wrote:
> Dear All,
>
> I have a question about using MPI_send and MPI_recv.
>
> The object is as follows:
> I would like to send an array Q from rank=1, N-1 to rank=0, and then rank 0 receives the Q from all other processors. Q will be put into a new array Y in rank 0. (of couse this is not realized by MPI). and then MPI_bcast is used (from rank 0) to broadcast the Y to all the processors.
>
> Fortran Code is like:
> if(myrank .eq. 0) then
> itag = myrank
> call mpi_send(Q.............., 0, itag, .................)
> else
> do i=1, N-1
> itag = i
> call mpi_recv(QRECS......., i, itag, .................)
> enddo
>
> endif
>
> call mpi_bcast(YVAR............., 0, ..............)
>
> Problem I met is:
> In my simulation, time marching is performed, These mpi_send and recv are fine for the first three time steps. However, for the fourth time step, the looping is only finished from i=1 to i=13, (totally 48 processors). That mean after 14th processors, the mpi_recv did not receive the date from them. Thus the code hangs there forever. Does deadlock occur for this situation? How can I figure out this problem?
>
> Thank you so much if anyone can give me some suggestions.
>
> best regards,
> Huangwei
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/