I tried excluding openib but it did not succeed. It actually made about the same progress as previously using the openib interface before hanging (I mean, my 30 second timeout period expired).
I’m more than happy to try out any other suggestions…
This seems to highlight a possible bug in the MPI implementation. As I suggested earlier, the credit management of the OpenIB might be unsafe.
To confirm this one last test to run. Let's prevent the OpenIB support from being used during the run (thus Open MPI will fall back to TCP). I suppose you should have ethernet cards in your cluster or you have IBoIP. Add "--mca btl ^openib" to your mpirun command. If this allows your application to run to completion then we know exactly where to start looking.
On Jun 27, 2013, at 19:59 , "Blosch, Edwin L" <firstname.lastname@example.org> wrote:
The debug version also hung, roughly the same amount of progress in the computations (although of course it took much longer to make that progress in comparison to the optimized version).
On the bright side, the idea of putting an mpi_barrier after the irecvs and before the isends appears to have helped. I was able to run 5 times farther without any trouble. So now I’m trying to run 50 times farther and, if no hang, I will declare workaround-victory.
What could this mean?
I am guessing that one or more processes may run ahead of the others, just because of the different amounts of work that precedes the communication step. If a process manages to post all its irecvs and post all its isends well before another process has managed to post any matching irecvs, perhaps there is some buffering resource on the sender side that is getting exhausted? This is pure guessing on my part.
It ran a bit longer but still deadlocked. All matching sends are posted 1:1with posted recvs so it is a delivery issue of some kind. I'm running a debug compiled version tonight to see what that might turn up. I may try to rewrite with blocking sends and see if that works. I can also try adding a barrier (irecvs, barrier, isends, waitall) to make sure sends are not buffering waiting for recvs to be posted.
Sent via the Samsung Galaxy S™ III, an AT&T 4G LTE smartphone
-------- Original message --------
From: George Bosilca <email@example.com>
To: Open MPI Users <firstname.lastname@example.org>
Subject: Re: [OMPI users] Application hangs on mpi_waitall
Im not sure but there might be a case where the BTL is getting overwhelmed by the nob-blocking operations while trying to setup the connection. There is a simple test for this. Add an MPI_Alltoall with a reasonable size (100k) before you start posting the non-blocking receives, and let's see if this solves your issue.
On Jun 26, 2013, at 04:02 , email@example.com wrote:
> An update: I recoded the mpi_waitall as a loop over the requests with
> mpi_test and a 30 second timeout. The timeout happens unpredictably,
> sometimes after 10 minutes of run time, other times after 15 minutes, for
> the exact same case.
> After 30 seconds, I print out the status of all outstanding receive
> requests. The message tags that are outstanding have definitely been
> sent, so I am wondering why they are not getting received?
> As I said before, everybody posts non-blocking standard receives, then
> non-blocking standard sends, then calls mpi_waitall. Each process is
> typically waiting on 200 to 300 requests. Is deadlock possible via this
> implementation approach under some kind of unusual conditions?
> Thanks again,
>> I'm running OpenMPI 1.6.4 and seeing a problem where mpi_waitall never
>> returns. The case runs fine with MVAPICH. The logic associated with the
>> communications has been extensively debugged in the past; we don't think
>> it has errors. Each process posts non-blocking receives, non-blocking
>> sends, and then does waitall on all the outstanding requests.
>> The work is broken down into 960 chunks. If I run with 960 processes (60
>> nodes of 16 cores each), things seem to work. If I use 160 processes
>> (each process handling 6 chunks of work), then each process is handling 6
>> times as much communication, and that is the case that hangs with OpenMPI
>> 1.6.4; again, seems to work with MVAPICH. Is there an obvious place to
>> start, diagnostically? We're using the openib btl.
>> users mailing list
> users mailing list
users mailing list
users mailing list