Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] mpi job is blocked
From: Richard (codemonkee_at_[hidden])
Date: 2012-09-25 02:20:58

I have 3 computers with the same Linux system. I have setup the mpi cluster based on ssh connection.
I have tested a very simple mpi program, it works on the cluster.

To make my story clear, I name the three computer as A, B and C.

1) If I run the job with 2 processes on A and B, it works.
2) if I run the job with 3 processes on A, B and C, it is blocked.
3) if I run the job with 2 processes on A and C, it works.
4) If I run the job with all the 3 processes on A, it works.

Using gdb I found the line at which it is blocked, it is here

#7 0x00002ad8a283043e in PMPI_Allreduce (sendbuf=0x7fff09c7c578, recvbuf=0x7fff09c7c570, count=1, datatype=0x627180, op=0x627780, comm=0x627380)
    at pallreduce.c:105
105 err = comm->c_coll.coll_allreduce(sendbuf, recvbuf, count,

It seems that there is a communication problem between some computers. But the above series of test cannot tell me what
exactly it is. Can anyone help me? thanks.