On Dec 11, 2008, at 2:55 PM, Terry Dontje wrote:
> Well under SGE it allows you to have SGE send mpirun SIGUSR1 so many
> minutes before sending the Suspend signal.
My point is that the right approach might be to work in the context of
Josh's CR stuff -- he's already got hooks for "do this right before
pausing for checkpoint" / "do this right after resuming", etc.
Sure, we're not checkpointing, but several of the characteristics of
this action are pretty similar to what is required for checkpointing/
restarting. So it might be good to use that framework for it...?
--
Jeff Squyres
Cisco Systems
|