Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ompi-clean on single executable
From: Reuti (reuti_at_[hidden])
Date: 2012-10-24 04:44:15


Hi,

Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere:

> I am having issue running ompi-clean which clean up (this is normal) session associated to a user which means it kills all running jobs assoicated to this session (this is also normal). But I would like to be able to clean up session associated to a job (a not user).
>
> Here is my point:
>
> I am running two executable :
>
> % mpirun -np 2 myexec1
> --> run with PID 2399 ...
> % mpirun -np 2 myexec2
> --> run with PID 2402 ...
>
> When I run orte-clean I got this result :
> % orte-clean -v
> orte-clean: cleaning session dir tree openmpi-sessions-ndelader_at_myhost_0
> orte-clean: killing any lingering procs
> orte-clean: found potential rogue orterun process (pid=2399,user=ndelader), sending SIGKILL...
> orte-clean: found potential rogue orterun process (pid=2402,user=ndelader), sending SIGKILL...
>
> Which means that both jobs have been killed :-(
> Basically I would like to perform orte-clean using executable name or PID or whatever that identify which job I want to stop an clean. It seems I would need to create an openmpi session per job. Does it make sense ? And I would like to be able to do something like following command and get following result :
>
> % orte-clean -v myexec1
> orte-clean: cleaning session dir tree openmpi-sessions-ndelader_at_myhost_0
> orte-clean: killing any lingering procs
> orte-clean: found potential rogue orterun process (pid=2399,user=ndelader), sending SIGKILL...
>
>
> Does it make sense ? Is there a way to perform this kind of selection in cleaning process ?

How many jobs are you starting on how many nodes at one time? This requirement could be a point to start to use a queuing system, where can remove job individually and also serialize your workflow. In fact: we use GridEngine also local on workstations for this purpose.

-- Reuti