Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] on-the-fly nodes liberation
From: Reuti (reuti_at_[hidden])
Date: 2009-03-09 08:56:03


Hi,

Am 09.03.2009 um 13:28 schrieb Marcia Cristina Cera:

> May I sign one orted daemon to finish its execution on-the-fly?
>
> Context: I intend to use OpenMPI in a dynamic resource environment
> as I did with LAM/MPI helped by lamgrow and lamshrink commands.
>
> To perform grow operations (increase the amount of nodes/resources
> on-the-fly) OpenMPI enable an incremental resource utilization. All
> nodes that can be used are listed in the hostifile file (inform as
> mpirun parameter) and according to they are firstly used through
> MPI_Comm_spawn one orded daemon is created in each node. According
> to some first tests, this feature is enough to satisfy our goals.
>
> In the other hand, performing shrink operations, we need to
> liberate nodes to be eventually used by other application/jobs. In
> other words, we must terminate all applications processes and also
> the orted daemon. In the application side, the solution is easy
> once we can notify the processes (by a message or signal) to safety
> finish its execution and perform MPI_Finalize. In the orted side,
> we must finish its execution in the target node and also update its
> status to 'INVALID'.
> How may I do it? Is there a specific signal or procedure to this?

how are you running your applications usually? This looks like you
are running all the jobs without any queuing system. If you had one,
it would mean to drain the nodes to exclude (if you want to have a
graceful shutdown of the already running jobs) and don't schedule any
jobs to this node further on. Or just kill all jobs running on this
node which should be excluded - of course, you might lose the
computing time spent on this node.

-- Reuti

> Thank you in advance!
> márcia.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users