Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI killed by signal 9
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-07-22 20:29:38

Signal 9 more than likely means that some external entity killed your MPI job (e.g., a resource manager determined that your process took too much time / CPU / whatever and killed it). That also makes sense since you say that short jobs complete with no problem, but (assumedly) longer jobs get killed like you described below -- with signal 9.

You might want to check with your system administrator and see if there are any resource limits on user-run applications.

On Jul 22, 2010, at 8:18 PM, Jack Bryan wrote:

> Dear All:
> I run a parallel job on 6 nodes of an OpenMPI cluster.
> But I got error:
> rank 0 in job 82 system.cluster_37948 caused collective abort of all ranks
> exit status of rank 0: killed by signal 9
> It seems that there is segmentation fault on node 0.
> But, if the program is run for a short time, no problem.
> Any help is appreciated.
> thanks,
> Jack
> July 22 2010
> The New Busy is not the old busy. Search, chat and e-mail from your inbox. Get started. _______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
For corporate legal information go to: