Am 03.04.2011 um 22:57 schrieb Ralph Castain:
> On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote:
>>>> I am not using that computer. A scenario that I have come across is
>>>> that when a msub job is killed because it has exceeded it's Walltime
>>>> mpi tasks spawned by ssh may not be terminated because (so I am told)
>>>> Torque does not know about them.
>>> Not true with OMPI. Torque will kill mpirun, which will in turn cause all MPI procs to die. Yes, it's true that Torque won't know about the MPI procs itself. However, OMPI is designed such that termination of mpirun by the resource manager will cause all apps to die.
>> How does Torque on NodeA know that an mpi launched on NodeB by ssh
>> should be killed?
> Torque works at the job level. So if you get an interactive Torque session, Torque can only kill your session - which means it automatically kills everything started within that session, regardless of where it resides.
> Perhaps you don't fully understand how Torque works? As a brief recap, Torque allocates the requested number of nodes. On one of the nodes, it starts a "sister mom" that is responsible for that job. It also wires Torque daemons on each of the other nodes to the "sister mom" to create, in effect, a virtual machine.
> When the Torque session is completed, the "sister mom" notifies all the other Torque daemons in the VM that the session shall be terminated. At that time, all local procs belonging to that session are terminated. It doesn't matter how those procs got there - by ssh, mpirun, whatever. They -all- are killed.
Is this a new feature? In the Torque clusters I saw they have cron jobs running on all nodes to remove processes which are not invoked by the TM interface of Torque, e.g. because they were started by ssh.
When I get you right, you state that even with an ssh to a node you will still get a correct accounting.
> What Torque cannot do is kill the actual mpi processes started by mpirun. See below.
>> OMPI is designed (from what I can see) for all
>> mpirun to be started from the same node, not distributed mpi launched
>> independently from multiple nodes.
> Remember, mpirun launches its own set of daemons on each node. Each daemon then locally spawns its set of mpi processes. So mpirun knows where everything is and can kill it.
> To further ensure cleanup, each daemon monitors mpirun's existence. So Torque only knows about mpirun, and Torque kills mpirun when (e.g.) walltime is reached. OMPI's daemons see that mpirun has died and terminate their local processes prior to terminating themselves.
I thought Open MPI has a tight integration into Torque by using the TM interface? Hence Torque provides a correct accounting and can also kill all started orted's as it knows about them.
> Torque cannot directly kill the mpi processes because it has no knowledge of their existence and relationship to the job session. Instead, since Torque knows about the ssh that started mpirun (since you executed it interactively), it kills the ssh - which causes mpirun to die, which then causes the mpi apps to die.
>> I am not certain that killing the
>> ssh on NodeA will in fact terminate a mpi launched on NodeB (i.e. by
>> ssh NodeB mpirun AAA...) with OMPI.
> It most certainly will! That mpirun on nodeB is executing under the ssh from nodeA, so when that ssh session is killed, it automatically kills everything run underneath it. And when mpirun dies, so does the job it was running, as per above.
> You can prove this to yourself rather easily. Just ssh to a remote node and execute any command that lingers for awhile - say something simple like "sleep". Then kill the ssh and do a "ps" on the remote node. I guarantee that the command will have died.
> users mailing list