Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] slowdown with infiniband and latest CentOS kernel
From: Dave Love (d.love_at_[hidden])
Date: 2013-12-18 10:32:22


Noam Bernstein <noam.bernstein_at_[hidden]> writes:

> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some
> collective communication), but now I'm wondering whether I should just test
> 1.6.5.

What bug, exactly? As you mentioned vasp, is it specifically affecting
that?

We have seen apparent deadlocks with vasp -- which users assure me is
due to malfunctioning hardware and/or batch system -- but I don't think
there was any evidence of it being due to openmpi (1.4 and 1.6 on
different systems here). I didn't have the padb --deadlock mode working
properly at the time I looked at one, but it seemed just to be stuck
with some ranks in broadcast and the rest in barrier. Someone else put
a parallel debugger on it, but I'm not sure if there was a conclusive
result, and I'm not very interested in debugging proprietary programs.