Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] How to restart a job twice
From: Tamer (tamer_at_[hidden])
Date: 2008-04-18 01:14:03

Dear all, I installed the developer's version r14519 and was able to
get it running. I successfully checkpointed a parallel job and
restarted it. My question is how can I checkpoint the restarted job?
The problem is once the original job is terminated and restarted later
on, the mpirun does not exist anymore (ps -efa|grep mpirun) and hence
I do not know which PID I should use when I run the ompi-checkpoint on
the restarted job. Any help would be greatly appreciated.