Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to cease the process triggered by OPENMPI
From: Rolf Vandevaart (Rolf.Vandevaart_at_[hidden])
Date: 2008-07-28 11:15:55


One other option which should kill of processes and cleanup is the
orte-clean command. In your case, you could do the following:

mpirun -hostfile ~/hostfile --pernode orte-clean

There is a man page for it also.

Rolf

Brock Palen wrote:
> You would be much better off to not use nohup, and then just kill the
> mpirun.
>
> What I mean is a batch system
> (http://www.clusterresources.com/pages/products/torque-resource-manager.php).
> Most batch systems have a launching system that lets you kill all the
> remote processes when you kill the job.
>
> Look at how MPI works. When you are starting the way you are starting
> MPI (without a batch system) you are using ether ssh or rsh to start the
> remote processes. Once these are started, the user has no control over
> the remote processes.
>
> Try killing your mpirun not your orted or pw.x. You will be much
> happier with a batch system.
> Or make a script that ssh to hostfile and kills pw.x on all of them.
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp_at_[hidden] <mailto:brockp_at_[hidden]>
> (734)936-1985
>
>
>
> On Jul 27, 2008, at 2:04 PM, vega lew wrote:
>> Dear Brock Palen,
>>
>> Thank you for your responding.
>>
>> My linux is redhat enterprise 4. My compiler is 10.1.015 version of
>> intel fortran and intel c.
>>
>> You said 'when the job is killed all the children are also'
>>
>> But I started my OPENMPI job using the nohup command to put the job
>> background like this,
>> " nohup mpirun -hostfile ~/hostfile -np 64 pw.x < input > output & ".
>>
>> When I killed one of the process named pw.x, all the others didn't stop.
>> When I killed the process named orted, the pw.x process in the same
>> node stoped immediately,
>> but the job in the other node were still running.
>>
>> Do you think there is something wrong with my cluster or openmpi or
>> the software named pw.x?
>>
>> Is there a command for openmpi to force all the process to stop in the
>> cluster or a list of nodes to stop.
>>
>> Vega Lew (weijia liu)
>> PH.D Candidate in Chemical Engineering
>> State Key Laboratory of Materials-oriented Chemical Engineering
>> College of Chemistry and Chemical Engineering
>> Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
>>
>> ------------------------------------------------------------------------
>> From: brockp_at_[hidden] <mailto:brockp_at_[hidden]>
>> Date: Sat, 26 Jul 2008 12:52:08 -0400
>> To: users_at_[hidden] <mailto:users_at_[hidden]>
>> Subject: Re: [OMPI users] How to cease the process triggered by OPENMPI
>>
>> Does the cluster your using use a batch system? Like SLURM, PBS or other?
>>
>> If so many have native ways to launch jobs that OMPI can use. SO that
>> when the job is killed all the children are also.
>>
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> brockp_at_[hidden] <mailto:brockp_at_[hidden]>
>> (734)936-1985
>>
>>
>>
>> On Jul 26, 2008, at 12:25 PM, vega lew wrote:
>>
>> Dear all,
>>
>> I have enjoyed the openmpi a couple of days. With the help of
>> openmpi I could run ESPRESSO efficiently.
>>
>> I started the mpi-job by the openmpi command like this,
>>
>> " nohup mpirun -hostfile ~/hostfile -np 64 pw.x < input > output &".
>>
>> When I want to stop the job before it finished, I find it not easy
>> to stop all the process manually. When I killed the process
>> in one node of the cluster, the processes in other nodes were
>> still running. So I must ssh to every node, find the
>> process id and kill the process. If there are 100 processors or
>> more for one mpi job, the situation even worse.
>>
>> Is there a command for openmpi to force all the process to stop in
>> the cluster or a list of nodes to stop.
>>
>> vega
>>
>> Vega Lew (weijia liu)
>> PH.D Candidate in Chemical Engineering
>> State Key Laboratory of Materials-oriented Chemical Engineering
>> College of Chemistry and Chemical Engineering
>> Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
>> ------------------------------------------------------------------------
>> Explore the seven wonders of the world Learn more!
>> <http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ------------------------------------------------------------------------
>> Get news, entertainment and everything you care about at Live.com.
>> Check it out! <http://www.live.com/getstarted.aspx >
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
=========================
rolf.vandevaart_at_[hidden]
781-442-3043
=========================