Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Question about hanging mpirun
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-07-07 20:35:48

On Jul 5, 2011, at 2:21 PM, Ralph Castain wrote:

>> Ok I think I figured out what the deadlock in my application was... and please confirm if this makes sense:
>> 1. There was an 'if' condition that was met, causing 2 (out of 3) of my processes to call MPI_finalize().
>> 2. The remaining process was still trying to run and at some point was requesting calls like MPI_receive(), MPI_send() and MPI_wait() while the other two processes were at MPI_finalize() (althought they would never exit).The application would hang at that point, but the program was too big for me to figure out where exactly the lonely running process would hang.
>> 3. I am no expert on openmpi, so I would appreciate it if someone can confirm if this was an expected behavior. I addressed the condition and now all processes run their course.
> That is correct behavior for MPI - i.e., if one process is rattling off MPI requests while the others have already entered finalize, then the job will hang since the requests cannot possibly be met and that proc never calls finalize to release completion of the job.

One clarification on this point...

If process A calls MPI_Send to process B and that send completes before B actually receives the message (e.g., if the message was small and there were no other messages pending between A and B), and then A calls MPI_Finalize, then B can still legally call MPI_Recv to receive the outstanding message. That scenario should work fine.

What doesn't work is if you initiate new communication to a process that has called MPI_Finalize -- e.g., if you MPI_Send to a finalized process, or you try to MPI_Recv a message that wasn't send before the peer finalized.

Make sense?

Jeff Squyres
For corporate legal information go to: