2008/5/29 Andreas Schäfer <
gentryx@gmx.de>:
Hi Amy,
On 16:10 Thu 29 May , Lee Amy wrote:
> MicroTar parallel version was terminated after 463 minutes with following
> error messages:
> ================================================
> [gnode5:31982] [ 0] /lib64/tls/libpthread.so.0 [0x345460c430]
> [gnode5:31982] [ 1] microtar(LocateNuclei+0x137) [0x403037]
> [gnode5:31982] [ 2] microtar(main+0x4ac) [0x40431c]
> [gnode5:31982] [ 3] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
> [0x3453b1c3fb]
> [gnode5:31982] [ 4] microtar [0x402e6a]
> [gnode5:31982] *** End of error message ***
> mpirun noticed that job rank 0 with PID 18710 on node gnode1 exited on
> signal 15 (Terminated).
> 19 additional processes aborted (not shown)
> ================================================
if I'm not mistaken, signal 15 is SIGTERM, which is sent to processes
to terminate them. To me this sounds like your application is
terminated from an external instance, maybe because your job exceeded
the wall clock time limit of your scheduling system. Does the job
repeatedly fail at the same time? Do shorter jobs finish successfully?
Just my 0.02 Euros (-8
Cheers
-Andreas
--
============================================
Andreas Schäfer
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net
============================================
(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
Thank you very much. If I do a shorter job it seems run well. And the job dosen't repeatedly fail at the same time, but it will fail at this error messages. Anyway, I'm not using a scheduling system. So any suggestions?