On Aug 14, 2013, at 9:23 AM, "Hazelrig, Chris CTR (US)" <christopher.c.hazelrig.ctr_at_[hidden]> wrote:
> Thanks for your suggestions. I had already tested for which threads were reaching the Finalize() call and all of them are. Also, the Finalize() call is not inside a conditional. This seems to suggest there may be a prior communication left unfinished, but based on the documentation I have read I would think the Finalize() routine would error/exception out in that situation.
Sorry for the delayed reply -- I was on vacation last week.
Not necessarily -- you can definitely deadlock in Finalize if you've done a send that isn't matched with a receive, for example.
> It seems significant that the software was performing as expected under the previous OS and OpenMPI versions (although, the older OpenMPI version is only slightly older than what is being used now), but I don't know yet what the differences are.
Possibly, but not definitely. Just because an application runs properly under an MPI implementation does not mean that that application is correct (that sounds snobby, but I don't mean it that way). Buffer allocations and blocking patterns change from release to release of a given MPI implementation, such that if you have an erroneous MPI application, it may work fine under version A of that MPI implementation but fail under version B of that same MPI implementation.
> Is there any other information I could provide that might be useful?
You might want to audit the code and ensure that you have no pending communications that haven't finished -- check all your sends and receives, not just in the code, but at run-time (e.g., use an MPI profiling tool to match up the sends and receives, and see what's left at Finalize time).
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/