Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Abort under slurm
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-02-26 14:09:31


It should work - check the following srun option:

       -K, --kill-on-bad-exit[=0|1]
              Controls whether or not to terminate a job if any task exits with a non-zero exit code. If this option is not specified, the default action will be
              based upon the SLURM configuration parameter of KillOnBadExit. If this option is specified, it will take precedence over KillOnBadExit. An option
              argument of zero will not terminate the job. A non-zero argument or no argument will terminate the job. Note: This option takes precedence over the
              -W, --wait option to terminate the job immediately if a task exits with a non-zero exit code.

My guess is that your configuration parameter for KillOnBadExit has not been specified, or you aborted with a zero status.

On Feb 26, 2013, at 9:08 AM, Bokassa <bokassa_at_[hidden]> wrote:

> Hi Ralph, thanks for your answer. I am using:
>
> >mpirun --version
> mpirun (Open MPI) 1.5.4
>
> Report bugs to http://www.open-mpi.org/community/help/
>
> and slurm 2.5.
>
> Should I try to upgrade to 1.6.5?
>
>
>
> /David/Bigagli
> www.davidbigagli.com
>
>
> On Mon, Feb 25, 2013 at 7:38 PM, Bokassa <bokassa_at_[hidden]> wrote:
> Hi,
> I noticed that MPI_Abort() does not abort the tasks if the mpi program is started using srun.
> I call MPI_Abort() from rank 0, this process exit, but the other ranks keep running or waiting for IO
> on the other nodes. The only way to kill the job is to use scancel.
> However if I use mpirun under a slurm allocation then MPI_Abort() works as expected aborting
> all tasks.
>
> Is this a known issue?
>
> Thanks, David
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users