You can use a debugger (just gdb will do, no TotalView needed) to find
out which MPI send & receive calls are hanging the code on the
distributed cluster, and see if the send & receive pair is due to a
problem described at:
Deadlock avoidance in your MPI programs:
Grid Engine / Open Grid Scheduler
On Fri, Sep 30, 2011 at 11:06 AM, Jack Bryan <dtustudy68_at_[hidden]> wrote:
> I have a Open MPI program, which works well on a Linux shared memory
> multicore (2 x 6 cores) machine.
> But, it does not work well on a distributed cluster with Linux Open MPI.
> I found that the the process sends out some messages to other processes,
> which can not receive them.
> What is the possible reason ?
> I do not change anything of the program.
> Any help is really appreciated.
> users mailing list
Open Grid Scheduler - The Official Open Source Grid Engine