Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi fails to terminate for errors/signals on some but not all processes
From: Douglas Guptill (douglas.guptill_at_[hidden])
Date: 2010-02-10 09:26:54


Hello Lawrence:

If I correctly remember your code which created this problem, perhaps
you could solve it by using the iostatus parameter:

   read(unit,*,iostatus=ierror) some_variable
   if (ierror.ne.0) then
c handle error
   endif

Hope that helps,
Douglas.

On Mon, Feb 08, 2010 at 01:29:38PM -0600, Laurence Marks wrote:
> This was "Re: [OMPI users] Trapping fortran I/O errors leaving zombie
> mpi processes", but it is more severe than this.
>
> Sorry, but it appears that at least with ifort most run-time errors
> and signals will leave zombie processes behind with openmpi if they
> only occur on some of the processors and not all. You can test this
> with the attached using (for instance)
>
> mpicc -c doraise.c
> mpif90 -o crash_test crash_test.F doraise.o -FR -xHost -O3
>
> Then, as appropriate mpirun -np 8 crash_test
>
> The output is self explanatory, and has an option to both try and
> simulate common fortran problems as well as to send fortran or C
> signals to the process. Please note that the results can be dependent
> upon the level of optimization, and with other compilers there could
> be problems where the compiler complains about SIGSEV or other errors
> since the code deliberately tries to create these.
>
> --
> Laurence Marks
> Department of Materials Science and Engineering
> MSE Rm 2036 Cook Hall
> 2220 N Campus Drive
> Northwestern University
> Evanston, IL 60208, USA
> Tel: (847) 491-3996 Fax: (847) 491-7820
> email: L-marks at northwestern dot edu
> Web: www.numis.northwestern.edu
> Chair, Commission on Electron Crystallography of IUCR
> www.numis.northwestern.edu/
> Electron crystallography is the branch of science that uses electron
> scattering and imaging to study the structure of matter.

> #include <signal.h>
> #include <stdio.h>
>
> void doraise(isig)
> long isig[1] ;
> {
> int i, j ;
> i = isig[0];
> raise( i ); /* signal i is raised */
> }
>
> void doraise_(isig)
> long isig[1] ;
> {
> doraise(isig) ;
> }
>
> void whatsig(isig)
> long isig[1] ;
> {
> int i ;
> i = isig[0];
> psignal( i , "Testing Signal");
> }
>
> void whatsig_(isig)
> long isig[1] ;
> {
> whatsig(isig) ;
> }
>
> void showallsignals()
> {
> int i ;
> char buf[15];
> for ( i = 1; i < 32; i++ ) {
> sprintf(buf, "Signal code %d ", i);
> psignal( i , buf );
> }
> }
>
> void showallsignals_()
> {
> showallsignals() ;
> }
>

> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users