Hello !
I am working on some simulations where I have to perform periodic
kill-restart and checkpointing on a MPI application.
As a checkpoint can take place immediately after restart I need some way to
know whether ompi-restart of the application is complete.
If I do not ensure that restart of all application processes is complete,
ompi-checkpoint fails after throwing a slew of errors.
Can someone please suggest an idea for having some kind of notification
indicating restarts have complete (in the sense that checkpointing .
Thank you,
Kishor
|