Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Help: Program Terminated
From: Lee Amy (openlinuxsource_at_[hidden])
Date: 2008-05-30 00:28:26


2008/5/29 Andreas Schäfer <gentryx_at_[hidden]>:

> Hi Amy,
>
> On 16:10 Thu 29 May , Lee Amy wrote:
> > MicroTar parallel version was terminated after 463 minutes with following
> > error messages:
> > ================================================
> > [gnode5:31982] [ 0] /lib64/tls/libpthread.so.0 [0x345460c430]
> > [gnode5:31982] [ 1] microtar(LocateNuclei+0x137) [0x403037]
> > [gnode5:31982] [ 2] microtar(main+0x4ac) [0x40431c]
> > [gnode5:31982] [ 3] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
> > [0x3453b1c3fb]
> > [gnode5:31982] [ 4] microtar [0x402e6a]
> > [gnode5:31982] *** End of error message ***
> > mpirun noticed that job rank 0 with PID 18710 on node gnode1 exited on
> > signal 15 (Terminated).
> > 19 additional processes aborted (not shown)
> > ================================================
>
> if I'm not mistaken, signal 15 is SIGTERM, which is sent to processes
> to terminate them. To me this sounds like your application is
> terminated from an external instance, maybe because your job exceeded
> the wall clock time limit of your scheduling system. Does the job
> repeatedly fail at the same time? Do shorter jobs finish successfully?
>
> Just my 0.02 Euros (-8
>
> Cheers
> -Andreas
>
>
> --
> ============================================
> Andreas Schäfer
> Cluster and Metacomputing Working Group
> Friedrich-Schiller-Universität Jena, Germany
> PGP/GPG key via keyserver
> I'm a bright... http://www.the-brights.net
> ============================================
>
> (\___/)
> (+'.'+)
> (")_(")
> This is Bunny. Copy and paste Bunny into your
> signature to help him gain world domination!
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
Thank you very much. If I do a shorter job it seems run well. And the job
dosen't repeatedly fail at the same time, but it will fail at this error
messages. Anyway, I'm not using a scheduling system. So any suggestions?

Thank you again.

Regards,

Amy