Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to cease the process triggered by OPENMPI
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-07-28 15:49:22


Killing mpirun would be your easiest solution.

Or you could just run "mpirun ..." inside "screen", and if you ever
want to kill it, re-attach to the screen and hit ctrl-C to kill
mpirun. See the man page screen(1). screen is your friend for very
long-running jobs, particularly if you're connecting via a laptop and
need to hop on and off the network -- you can just "detach" from the
session and "reattach" to it later. Screen is some of the best 70's
technology that is still highly relevant today. :-)

On Jul 28, 2008, at 1:24 PM, vega lew wrote:

> OK, thank you for your reply.
> I'll try to make a script to kill all the process using 'killall
> pw.x' .
>
> Thank you again.
>
>
> Vega Lew (weijia liu)
> PH.D Candidate in Chemical Engineering
> State Key Laboratory of Materials-oriented Chemical Engineering
> College of Chemistry and Chemical Engineering
> Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
>
> From: brockp_at_[hidden]
> Date: Mon, 28 Jul 2008 11:05:33 -0400
> To: users_at_[hidden]
> Subject: Re: [OMPI users] How to cease the process triggered by
> OPENMPI
>
> You would be much better off to not use nohup, and then just kill
> the mpirun.
>
> What I mean is a batch system (http://www.clusterresources.com/pages/products/torque-resource-manager.php
> ). Most batch systems have a launching system that lets you kill
> all the remote processes when you kill the job.
>
> Look at how MPI works. When you are starting the way you are
> starting MPI (without a batch system) you are using ether ssh or rsh
> to start the remote processes. Once these are started, the user has
> no control over the remote processes.
>
> Try killing your mpirun not your orted or pw.x. You will be much
> happier with a batch system.
> Or make a script that ssh to hostfile and kills pw.x on all of them.
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp_at_[hidden]
> (734)936-1985
>
>
>
> On Jul 27, 2008, at 2:04 PM, vega lew wrote:
> Dear Brock Palen,
>
> Thank you for your responding.
>
> My linux is redhat enterprise 4. My compiler is 10.1.015 version of
> intel fortran and intel c.
>
> You said 'when the job is killed all the children are also'
>
> But I started my OPENMPI job using the nohup command to put the job
> background like this,
> " nohup mpirun -hostfile ~/hostfile -np 64 pw.x < input > output & ".
>
> When I killed one of the process named pw.x, all the others didn't
> stop.
> When I killed the process named orted, the pw.x process in the same
> node stoped immediately,
> but the job in the other node were still running.
>
> Do you think there is something wrong with my cluster or openmpi or
> the software named pw.x?
>
> Is there a command for openmpi to force all the process to stop in
> the cluster or a list of nodes to stop.
>
> Vega Lew (weijia liu)
> PH.D Candidate in Chemical Engineering
> State Key Laboratory of Materials-oriented Chemical Engineering
> College of Chemistry and Chemical Engineering
> Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
>
> From: brockp_at_[hidden]
> Date: Sat, 26 Jul 2008 12:52:08 -0400
> To: users_at_[hidden]
> Subject: Re: [OMPI users] How to cease the process triggered by
> OPENMPI
>
> Does the cluster your using use a batch system? Like SLURM, PBS or
> other?
>
> If so many have native ways to launch jobs that OMPI can use. SO
> that when the job is killed all the children are also.
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp_at_[hidden]
> (734)936-1985
>
>
>
> On Jul 26, 2008, at 12:25 PM, vega lew wrote:
> Dear all,
>
> I have enjoyed the openmpi a couple of days. With the help of
> openmpi I could run ESPRESSO efficiently.
>
> I started the mpi-job by the openmpi command like this,
>
> " nohup mpirun -hostfile ~/hostfile -np 64 pw.x < input > output &".
>
> When I want to stop the job before it finished, I find it not easy
> to stop all the process manually. When I killed the process
> in one node of the cluster, the processes in other nodes were still
> running. So I must ssh to every node, find the
> process id and kill the process. If there are 100 processors or more
> for one mpi job, the situation even worse.
>
> Is there a command for openmpi to force all the process to stop in
> the cluster or a list of nodes to stop.
>
> vega
>
> Vega Lew (weijia liu)
> PH.D Candidate in Chemical Engineering
> State Key Laboratory of Materials-oriented Chemical Engineering
> College of Chemistry and Chemical Engineering
> Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
> Explore the seven wonders of the world Learn more!
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> Get news, entertainment and everything you care about at Live.com.
> Check it out!
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> Connect to the next generation of MSN Messenger Get it now!
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems