Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to cease the process triggered by OPENMPI
From: Brock Palen (brockp_at_[hidden])
Date: 2008-07-28 11:05:33


You would be much better off to not use nohup, and then just kill the
mpirun.

What I mean is a batch system (http://www.clusterresources.com/pages/
products/torque-resource-manager.php). Most batch systems have a
launching system that lets you kill all the remote processes when you
kill the job.

Look at how MPI works. When you are starting the way you are
starting MPI (without a batch system) you are using ether ssh or rsh
to start the remote processes. Once these are started, the user has
no control over the remote processes.

Try killing your mpirun not your orted or pw.x. You will be much
happier with a batch system.
Or make a script that ssh to hostfile and kills pw.x on all of them.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp_at_[hidden]
(734)936-1985

On Jul 27, 2008, at 2:04 PM, vega lew wrote:

> Dear Brock Palen,
>
> Thank you for your responding.
>
> My linux is redhat enterprise 4. My compiler is 10.1.015 version of
> intel fortran and intel c.
>
> You said 'when the job is killed all the children are also'
>
> But I started my OPENMPI job using the nohup command to put the job
> background like this,
> " nohup mpirun -hostfile ~/hostfile -np 64 pw.x < input > output & ".
>
> When I killed one of the process named pw.x, all the others didn't
> stop.
> When I killed the process named orted, the pw.x process in the same
> node stoped immediately,
> but the job in the other node were still running.
>
> Do you think there is something wrong with my cluster or openmpi or
> the software named pw.x?
>
> Is there a command for openmpi to force all the process to stop in
> the cluster or a list of nodes to stop.
>
> Vega Lew (weijia liu)
> PH.D Candidate in Chemical Engineering
> State Key Laboratory of Materials-oriented Chemical Engineering
> College of Chemistry and Chemical Engineering
> Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
>
> From: brockp_at_[hidden]
> Date: Sat, 26 Jul 2008 12:52:08 -0400
> To: users_at_[hidden]
> Subject: Re: [OMPI users] How to cease the process triggered by
> OPENMPI
>
> Does the cluster your using use a batch system? Like SLURM, PBS or
> other?
>
> If so many have native ways to launch jobs that OMPI can use. SO
> that when the job is killed all the children are also.
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp_at_[hidden]
> (734)936-1985
>
>
>
> On Jul 26, 2008, at 12:25 PM, vega lew wrote:
> Dear all,
>
> I have enjoyed the openmpi a couple of days. With the help of
> openmpi I could run ESPRESSO efficiently.
>
> I started the mpi-job by the openmpi command like this,
>
> " nohup mpirun -hostfile ~/hostfile -np 64 pw.x < input > output &".
>
> When I want to stop the job before it finished, I find it not easy
> to stop all the process manually. When I killed the process
> in one node of the cluster, the processes in other nodes were still
> running. So I must ssh to every node, find the
> process id and kill the process. If there are 100 processors or
> more for one mpi job, the situation even worse.
>
> Is there a command for openmpi to force all the process to stop in
> the cluster or a list of nodes to stop.
>
> vega
>
> Vega Lew (weijia liu)
> PH.D Candidate in Chemical Engineering
> State Key Laboratory of Materials-oriented Chemical Engineering
> College of Chemistry and Chemical Engineering
> Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
> Explore the seven wonders of the world Learn more!
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> Get news, entertainment and everything you care about at Live.com.
> Check it out!
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users