Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] openmpi fails to terminate for errors/signals on some but not all processes
From: Laurence Marks (L-marks_at_[hidden])
Date: 2010-02-08 14:29:38

This was "Re: [OMPI users] Trapping fortran I/O errors leaving zombie
mpi processes", but it is more severe than this.

Sorry, but it appears that at least with ifort most run-time errors
and signals will leave zombie processes behind with openmpi if they
only occur on some of the processors and not all. You can test this
with the attached using (for instance)

mpicc -c doraise.c
mpif90 -o crash_test crash_test.F doraise.o -FR -xHost -O3

Then, as appropriate mpirun -np 8 crash_test

The output is self explanatory, and has an option to both try and
simulate common fortran problems as well as to send fortran or C
signals to the process. Please note that the results can be dependent
upon the level of optimization, and with other compilers there could
be problems where the compiler complains about SIGSEV or other errors
since the code deliberately tries to create these.

Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Chair, Commission on Electron Crystallography of IUCR
Electron crystallography is the branch of science that uses electron
scattering and imaging to study the structure of matter.