Thank you very much. If I do a shorter job it seems run well. And the job dosen't repeatedly fail at the same time, but it will fail at this error messages. Anyway, I'm not using a scheduling system. So any suggestions?
2008/5/29 Andreas Schäfer <email@example.com
if I'm not mistaken, signal 15 is SIGTERM, which is sent to processes
On 16:10 Thu 29 May , Lee Amy wrote:
> MicroTar parallel version was terminated after 463 minutes with following
> error messages:
> [gnode5:31982] [ 0] /lib64/tls/libpthread.so.0 [0x345460c430]
> [gnode5:31982] [ 1] microtar(LocateNuclei+0x137) [0x403037]
> [gnode5:31982] [ 2] microtar(main+0x4ac) [0x40431c]
> [gnode5:31982] [ 3] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
> [gnode5:31982] [ 4] microtar [0x402e6a]
> [gnode5:31982] *** End of error message ***
> mpirun noticed that job rank 0 with PID 18710 on node gnode1 exited on
> signal 15 (Terminated).
> 19 additional processes aborted (not shown)
to terminate them. To me this sounds like your application is
terminated from an external instance, maybe because your job exceeded
the wall clock time limit of your scheduling system. Does the job
repeatedly fail at the same time? Do shorter jobs finish successfully?
Just my 0.02 Euros (-8
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany
PGP/GPG key via keyserver
I'm a bright... http://www.the-brights.net
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!
users mailing list