On May 5, 2010, at 7:54 PM, Douglas Guptill wrote:
> P.S. Yes, I know OpenMPI 1.2.8 is old. We have been using it for 2
> years with no apparent problems.
It ain't broke; don't fix it -- nothing wrong with that.
> When I saw comments like "machine hung" for 1.4.1,
FWIW, I find it hard to believe that Open MPI is the cause of machine hangs. Open MPI is user-level process stuff, which should generally not be able to crash Linux. If user-level processes can hang Linux, then something else is probably broken.
But also FWIW, we have found various MPI benchmarks and test applications can be *excellent* at finding underlying server / network problems. I can't think of a case offhand where Open MPI "caused" a machine to hang/crash/die/whatever that wasn't ultimately tracked down to some other root cause.
> and "data loss" for 1.3.x, I put aside thoughts of upgrading.
We definitely did have a big problem with OpenFabrics registered memory in Open MPI 1.3.0 and 1.3.1 (corrected in 1.3.2). Shame on us. :-(
But to continue the "FWIW" from above: we actually do *millions* of regression tests before Open MPI is released -- literally. All of us were convinced that 1.3.0 and 1.3.1 were ok to release; the data corruption issues caught us by surprise. Yuck. Those kinds of bugs are the worst. :-(
For corporate legal information go to: