Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to cease the process triggered by OPENMPI
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-07-28 15:47:40


orte-clean a new feature in v1.3. Rolf is just excited about it. ;-)

(actually, I think it wasn't ready for prime time in the v1.2 series
so we pulled it from the 1.2 distributions)

On Jul 28, 2008, at 11:23 AM, Brock Palen wrote:

> I don't see this this command in my 1.2.6 install. There also isn't
> a man page.
>
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp_at_[hidden]
> (734)936-1985
>
>
>
> On Jul 28, 2008, at 11:15 AM, Rolf Vandevaart wrote:
>>
>> One other option which should kill of processes and cleanup is the
>> orte-clean command. In your case, you could do the following:
>>
>> mpirun -hostfile ~/hostfile --pernode orte-clean
>>
>> There is a man page for it also.
>>
>> Rolf
>>
>> Brock Palen wrote:
>>> You would be much better off to not use nohup, and then just kill
>>> the mpirun.
>>> What I mean is a batch system (http://www.clusterresources.com/pages/products/torque-resource-manager.php
>>> ). Most batch systems have a launching system that lets you kill
>>> all the remote processes when you kill the job. Look at how MPI
>>> works. When you are starting the way you are starting MPI
>>> (without a batch system) you are using ether ssh or rsh to start
>>> the remote processes. Once these are started, the user has no
>>> control over the remote processes. Try killing your mpirun not
>>> your orted or pw.x. You will be much happier with a batch system.
>>> Or make a script that ssh to hostfile and kills pw.x on all of them.
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> Center for Advanced Computing
>>> brockp_at_[hidden] <mailto:brockp_at_[hidden]>
>>> (734)936-1985
>>> On Jul 27, 2008, at 2:04 PM, vega lew wrote:
>>>> Dear Brock Palen,
>>>>
>>>> Thank you for your responding.
>>>>
>>>> My linux is redhat enterprise 4. My compiler is 10.1.015 version
>>>> of intel fortran and intel c.
>>>>
>>>> You said 'when the job is killed all the children are also'
>>>>
>>>> But I started my OPENMPI job using the nohup command to put the
>>>> job background like this,
>>>> " nohup mpirun -hostfile ~/hostfile -np 64 pw.x < input > output
>>>> & ".
>>>>
>>>> When I killed one of the process named pw.x, all the others
>>>> didn't stop.
>>>> When I killed the process named orted, the pw.x process in the
>>>> same node stoped immediately,
>>>> but the job in the other node were still running.
>>>>
>>>> Do you think there is something wrong with my cluster or openmpi
>>>> or the software named pw.x?
>>>>
>>>> Is there a command for openmpi to force all the process to stop
>>>> in the cluster or a list of nodes to stop.
>>>> Vega Lew (weijia liu)
>>>> PH.D Candidate in Chemical Engineering
>>>> State Key Laboratory of Materials-oriented Chemical Engineering
>>>> College of Chemistry and Chemical Engineering
>>>> Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
>>>>
>>>> ------------------------------------------------------------------------
>>>> From: brockp_at_[hidden] <mailto:brockp_at_[hidden]>
>>>> Date: Sat, 26 Jul 2008 12:52:08 -0400
>>>> To: users_at_[hidden] <mailto:users_at_[hidden]>
>>>> Subject: Re: [OMPI users] How to cease the process triggered by
>>>> OPENMPI
>>>>
>>>> Does the cluster your using use a batch system? Like SLURM, PBS
>>>> or other?
>>>>
>>>> If so many have native ways to launch jobs that OMPI can use. SO
>>>> that when the job is killed all the children are also.
>>>>
>>>> Brock Palen
>>>> www.umich.edu/~brockp
>>>> Center for Advanced Computing
>>>> brockp_at_[hidden] <mailto:brockp_at_[hidden]>
>>>> (734)936-1985
>>>>
>>>>
>>>>
>>>> On Jul 26, 2008, at 12:25 PM, vega lew wrote:
>>>>
>>>> Dear all,
>>>>
>>>> I have enjoyed the openmpi a couple of days. With the help of
>>>> openmpi I could run ESPRESSO efficiently.
>>>>
>>>> I started the mpi-job by the openmpi command like this,
>>>>
>>>> " nohup mpirun -hostfile ~/hostfile -np 64 pw.x < input >
>>>> output &".
>>>>
>>>> When I want to stop the job before it finished, I find it not
>>>> easy
>>>> to stop all the process manually. When I killed the process
>>>> in one node of the cluster, the processes in other nodes were
>>>> still running. So I must ssh to every node, find the
>>>> process id and kill the process. If there are 100 processors or
>>>> more for one mpi job, the situation even worse.
>>>>
>>>> Is there a command for openmpi to force all the process to
>>>> stop in
>>>> the cluster or a list of nodes to stop.
>>>> vega
>>>>
>>>> Vega Lew (weijia liu)
>>>> PH.D Candidate in Chemical Engineering
>>>> State Key Laboratory of Materials-oriented Chemical Engineering
>>>> College of Chemistry and Chemical Engineering
>>>> Nanjing University of Technology, 210009, Nanjing, Jiangsu,
>>>> China
>>>>
>>>> ------------------------------------------------------------------------
>>>> Explore the seven wonders of the world Learn more!
>>>> <http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE
>>>> >
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>> Get news, entertainment and everything you care about at
>>>> Live.com. Check it out! <http://www.live.com/getstarted.aspx >
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> ------------------------------------------------------------------------
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>>
>> =========================
>> rolf.vandevaart_at_[hidden]
>> 781-442-3043
>> =========================
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems