Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Application hangs on mpi_waitall
From: Blosch, Edwin L (edwin.l.blosch_at_[hidden])
Date: 2013-06-18 18:50:39

I'm running OpenMPI 1.6.4 and seeing a problem where mpi_waitall never returns. The case runs fine with MVAPICH. The logic associated with the communications has been extensively debugged in the past; we don't think it has errors. Each process posts non-blocking receives, non-blocking sends, and then does waitall on all the outstanding requests.

The work is broken down into 960 chunks. If I run with 960 processes (60 nodes of 16 cores each), things seem to work. If I use 160 processes (each process handling 6 chunks of work), then each process is handling 6 times as much communication, and that is the case that hangs with OpenMPI 1.6.4; again, seems to work with MVAPICH. Is there an obvious place to start, diagnostically? We're using the openib btl.