Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI process cannot do send-receive message correctly on a distributed memory cluster
From: Jack Bryan (dtustudy68_at_[hidden])
Date: 2011-09-30 11:49:39


Thanks,

I am using non-blocking MPI_Isend to send out message and using blocking MPI_Recv to get the message.

Each MPI_Isend use a distinct buffer to hold the message, which is not changed until the message is received.

Then, the sender process waits for the MPI_Isend to be finished.

Before this message is sent out, a heading message (about how many data and what data will be sent out in the following MPI_Isend)
is sent out in the same way, they can be received well.

Why the following message (which has larger size) cannot be received ?

Any help is really appreciated.

> Date: Fri, 30 Sep 2011 11:33:16 -0400
> From: raysonlogin_at_[hidden]
> To: users_at_[hidden]
> Subject: Re: [OMPI users] Open MPI process cannot do send-receive message correctly on a distributed memory cluster
>
> You can use a debugger (just gdb will do, no TotalView needed) to find
> out which MPI send & receive calls are hanging the code on the
> distributed cluster, and see if the send & receive pair is due to a
> problem described at:
>
> Deadlock avoidance in your MPI programs:
> http://www.cs.ucsb.edu/~hnielsen/cs140/mpi-deadlocks.html
>
> Rayson
>
> =================================
> Grid Engine / Open Grid Scheduler
> http://gridscheduler.sourceforge.net
>
> Wikipedia Commons
> http://commons.wikimedia.org/wiki/User:Raysonho
>
>
> On Fri, Sep 30, 2011 at 11:06 AM, Jack Bryan <dtustudy68_at_[hidden]> wrote:
> > Hi,
> >
> > I have a Open MPI program, which works well on a Linux shared memory
> > multicore (2 x 6 cores) machine.
> >
> > But, it does not work well on a distributed cluster with Linux Open MPI.
> >
> > I found that the the process sends out some messages to other processes,
> > which can not receive them.
> >
> > What is the possible reason ?
> >
> > I do not change anything of the program.
> >
> > Any help is really appreciated.
> >
> > Thanks
> >
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users