Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ompi-clean on single executable
From: Nicolas Deladerriere (nicolas.deladerriere_at_[hidden])
Date: 2012-10-24 07:01:51


Reuti,

The problem I am facing is a small small part of our production
system, and I cannot modify our mpirun submission system. This is why
i am looking at solution using only ompi-clean of mpirun command
specification.

Thanks,
Nicolas

2012/10/24, Reuti <reuti_at_[hidden]>:
> Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere:
>
>> Reuti,
>>
>> Thanks for your comments,
>>
>> In our case, we are currently running different mpirun commands on
>> clusters sharing the same frontend. Basically we use a wrapper to run
>> the mpirun command and to run an ompi-clean command to clean up the
>> mpi job if required.
>> Using ompi-clean like this just kills all other mpi jobs running on
>> same frontend. I cannot use queuing system
>
> Why? Using it on a single machine was only one possible setup. Its purpose
> is to distribute jobs to slave hosts. If you have already one frontend as
> login-machine it fits perfect: the qmaster (in case of SGE) can run there
> and the execd on the nodes.
>
> -- Reuti
>
>
>> as you have suggested this
>> is why I was wondering a option or other solution associated to
>> ompi-clean command to avoid this general mpi jobs cleaning.
>>
>> Cheers
>> Nicolas
>>
>> 2012/10/24, Reuti <reuti_at_[hidden]>:
>>> Hi,
>>>
>>> Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere:
>>>
>>>> I am having issue running ompi-clean which clean up (this is normal)
>>>> session associated to a user which means it kills all running jobs
>>>> assoicated to this session (this is also normal). But I would like to
>>>> be
>>>> able to clean up session associated to a job (a not user).
>>>>
>>>> Here is my point:
>>>>
>>>> I am running two executable :
>>>>
>>>> % mpirun -np 2 myexec1
>>>> --> run with PID 2399 ...
>>>> % mpirun -np 2 myexec2
>>>> --> run with PID 2402 ...
>>>>
>>>> When I run orte-clean I got this result :
>>>> % orte-clean -v
>>>> orte-clean: cleaning session dir tree
>>>> openmpi-sessions-ndelader_at_myhost_0
>>>> orte-clean: killing any lingering procs
>>>> orte-clean: found potential rogue orterun process
>>>> (pid=2399,user=ndelader), sending SIGKILL...
>>>> orte-clean: found potential rogue orterun process
>>>> (pid=2402,user=ndelader), sending SIGKILL...
>>>>
>>>> Which means that both jobs have been killed :-(
>>>> Basically I would like to perform orte-clean using executable name or
>>>> PID
>>>> or whatever that identify which job I want to stop an clean. It seems I
>>>> would need to create an openmpi session per job. Does it make sense ?
>>>> And
>>>> I would like to be able to do something like following command and get
>>>> following result :
>>>>
>>>> % orte-clean -v myexec1
>>>> orte-clean: cleaning session dir tree
>>>> openmpi-sessions-ndelader_at_myhost_0
>>>> orte-clean: killing any lingering procs
>>>> orte-clean: found potential rogue orterun process
>>>> (pid=2399,user=ndelader), sending SIGKILL...
>>>>
>>>>
>>>> Does it make sense ? Is there a way to perform this kind of selection
>>>> in
>>>> cleaning process ?
>>>
>>> How many jobs are you starting on how many nodes at one time? This
>>> requirement could be a point to start to use a queuing system, where can
>>> remove job individually and also serialize your workflow. In fact: we
>>> use
>>> GridEngine also local on workstations for this purpose.
>>>
>>> -- Reuti
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>