In fact I am running my dynamic or malleable MPI application as a single
job, which is able to increase and decrease its amount of nodes/processors
at runtime. I am using the OAR resource manager to launch and provide
resource availability information to the application.
My question concern in know if there is a safety way to stop some orted
deamon without affect the remain processes of my dynamic application.
According to Ralph answer, it is currently impossible.
thank for your help.
On Mon, Mar 9, 2009 at 1:56 PM, Reuti <reuti_at_[hidden]> wrote:
> Am 09.03.2009 um 13:28 schrieb Marcia Cristina Cera:- Show quoted text -
> May I sign one orted daemon to finish its execution on-the-fly?
>> Context: I intend to use OpenMPI in a dynamic resource environment as I
>> did with LAM/MPI helped by lamgrow and lamshrink commands.
>> To perform grow operations (increase the amount of nodes/resources
>> on-the-fly) OpenMPI enable an incremental resource utilization. All nodes
>> that can be used are listed in the hostifile file (inform as mpirun
>> parameter) and according to they are firstly used through MPI_Comm_spawn one
>> orded daemon is created in each node. According to some first tests, this
>> feature is enough to satisfy our goals.
>> In the other hand, performing shrink operations, we need to liberate nodes
>> to be eventually used by other application/jobs. In other words, we must
>> terminate all applications processes and also the orted daemon. In the
>> application side, the solution is easy once we can notify the processes (by
>> a message or signal) to safety finish its execution and perform
>> MPI_Finalize. In the orted side, we must finish its execution in the target
>> node and also update its status to 'INVALID'.
>> How may I do it? Is there a specific signal or procedure to this?
> how are you running your applications usually? This looks like you are
> running all the jobs without any queuing system. If you had one, it would
> mean to drain the nodes to exclude (if you want to have a graceful shutdown
> of the already running jobs) and don't schedule any jobs to this node
> further on. Or just kill all jobs running on this node which should be
> excluded - of course, you might lose the computing time spent on this node.
> -- Reuti
> Thank you in advance!
>> users mailing list
> - Show quoted text -
> users mailing list