Hi - I've been trying to run VASP 5.2.12 with ScaLAPACK and openmpi
1.6.x on a single 32 core (4 x 8 core) Opteron node, purely shared memory.
We've always had occasional hangs with older OpenMPI versions
(1.4.3 and 1.5.5) on these machines, but infrequent enough to be usable
and not worth my time to debug.
However, now that I've got to the 1.6 series (1.6.2, specifically), we're
getting frequent crashes, mostly but maybe not entirely deterministic. The
symptom is a segmentation fault in libopmpi.so, someplace under a call to
PZHEEVX, but in the traceback only routines names in VASP are being printed,
despite the fact that I have scalapack compiled with -g.
ScaLAPACK is v 1.8.0, because with v 2.0.2, it completely fails to converge.
I've tried a couple varieties of the intel compiler (11.1.080 and 220.127.116.111),
and a couple of versions of ACML (4.4.0 and 5.2.0). ACML version seems
not to matter, and the two varieties of ifort give the same type of behavior, but
crash in different places in the run. When I switch compilers and acml/scalapack
libraries I recompile everything, except fpr OpenMPI which is always compiled with
These crashes do not seem to occur on our 2 x 4 core Xeon + IB QDR nodes.
Has anyone seen anything like this, or has any idea how to get additional
useful information, for example traceback information so I can figure out what mpi
routine is having problems?