Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ompi-clean on single executable
From: Reuti (reuti_at_[hidden])
Date: 2012-10-24 05:55:01


Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere:

> Reuti,
>
> Thanks for your comments,
>
> In our case, we are currently running different mpirun commands on
> clusters sharing the same frontend. Basically we use a wrapper to run
> the mpirun command and to run an ompi-clean command to clean up the
> mpi job if required.
> Using ompi-clean like this just kills all other mpi jobs running on
> same frontend. I cannot use queuing system

Why? Using it on a single machine was only one possible setup. Its purpose is to distribute jobs to slave hosts. If you have already one frontend as login-machine it fits perfect: the qmaster (in case of SGE) can run there and the execd on the nodes.

-- Reuti

> as you have suggested this
> is why I was wondering a option or other solution associated to
> ompi-clean command to avoid this general mpi jobs cleaning.
>
> Cheers
> Nicolas
>
> 2012/10/24, Reuti <reuti_at_[hidden]>:
>> Hi,
>>
>> Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere:
>>
>>> I am having issue running ompi-clean which clean up (this is normal)
>>> session associated to a user which means it kills all running jobs
>>> assoicated to this session (this is also normal). But I would like to be
>>> able to clean up session associated to a job (a not user).
>>>
>>> Here is my point:
>>>
>>> I am running two executable :
>>>
>>> % mpirun -np 2 myexec1
>>> --> run with PID 2399 ...
>>> % mpirun -np 2 myexec2
>>> --> run with PID 2402 ...
>>>
>>> When I run orte-clean I got this result :
>>> % orte-clean -v
>>> orte-clean: cleaning session dir tree openmpi-sessions-ndelader_at_myhost_0
>>> orte-clean: killing any lingering procs
>>> orte-clean: found potential rogue orterun process
>>> (pid=2399,user=ndelader), sending SIGKILL...
>>> orte-clean: found potential rogue orterun process
>>> (pid=2402,user=ndelader), sending SIGKILL...
>>>
>>> Which means that both jobs have been killed :-(
>>> Basically I would like to perform orte-clean using executable name or PID
>>> or whatever that identify which job I want to stop an clean. It seems I
>>> would need to create an openmpi session per job. Does it make sense ? And
>>> I would like to be able to do something like following command and get
>>> following result :
>>>
>>> % orte-clean -v myexec1
>>> orte-clean: cleaning session dir tree openmpi-sessions-ndelader_at_myhost_0
>>> orte-clean: killing any lingering procs
>>> orte-clean: found potential rogue orterun process
>>> (pid=2399,user=ndelader), sending SIGKILL...
>>>
>>>
>>> Does it make sense ? Is there a way to perform this kind of selection in
>>> cleaning process ?
>>
>> How many jobs are you starting on how many nodes at one time? This
>> requirement could be a point to start to use a queuing system, where can
>> remove job individually and also serialize your workflow. In fact: we use
>> GridEngine also local on workstations for this purpose.
>>
>> -- Reuti
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users