On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote:


I am not using that computer. A scenario that I have come across is
that when a msub job is killed because it has exceeded it's Walltime
mpi tasks spawned by ssh may not be terminated because (so I am told)
Torque does not know about them.

Not true with OMPI. Torque will kill mpirun, which will in turn cause all MPI procs to die. Yes, it's true that Torque won't know about the MPI procs itself. However, OMPI is designed such that termination of mpirun by the resource manager will cause all apps to die.

How does Torque on NodeA know that an mpi launched on NodeB by ssh
should be killed?

Torque works at the job level. So if you get an interactive Torque session, Torque can only kill your session - which means it automatically kills everything started within that session, regardless of where it resides.

Perhaps you don't fully understand how Torque works? As a brief recap, Torque allocates the requested number of nodes. On one of the nodes, it starts a "sister mom" that is responsible for that job. It also wires Torque daemons on each of the other nodes to the "sister mom" to create, in effect, a virtual machine.

When the Torque session is completed, the "sister mom" notifies all the other Torque daemons in the VM that the session shall be terminated. At that time, all local procs belonging to that session are terminated. It doesn't matter how those procs got there - by ssh, mpirun, whatever. They -all- are killed.

What Torque cannot do is kill the actual mpi processes started by mpirun. See below.

OMPI is designed (from what I can see) for all
mpirun to be started from the same node, not distributed mpi launched
independently from multiple nodes.

Remember, mpirun launches its own set of daemons on each node. Each daemon then locally spawns its set of mpi processes. So mpirun knows where everything is and can kill it.

To further ensure cleanup, each daemon monitors mpirun's existence. So Torque only knows about mpirun, and Torque kills mpirun when (e.g.) walltime is reached. OMPI's daemons see that mpirun has died and terminate their local processes prior to terminating themselves.

Torque cannot directly kill the mpi processes because it has no knowledge of their existence and relationship to the job session. Instead, since Torque knows about the ssh that started mpirun (since you executed it interactively), it kills the ssh - which causes mpirun to die, which then causes the mpi apps to die.


I am not certain that killing the
ssh on NodeA will in fact terminate a mpi launched on NodeB (i.e. by
ssh NodeB mpirun AAA...) with OMPI.


It most certainly will! That mpirun on nodeB is executing under the ssh from nodeA, so when that ssh session is killed, it automatically kills everything run underneath it. And when mpirun dies, so does the job it was running, as per above.

You can prove this to yourself rather easily. Just ssh to a remote node and execute any command that lingers for awhile - say something simple like "sleep". Then kill the ssh and do a "ps" on the remote node. I guarantee that the command will have died.