HI,
I have installed the openmpi-1.3a1r18651 and tried to checkpoint an mpi application.
raj_at_portal018:~/examples> mpirun -np 1 -am ft-enable-cr ./myapp.sh &
raj_at_portal018:~/examples> ompi-checkpoint --term 30416
However, when i try to restart the checkped file, I get the following message.
raj_at_portal018:~> ompi-restart -v -machinefile portal018 ompi_global_snapshot_30416.ckpt
[portal018:20178] Checking for the existence of (/home/raj/ompi_global_snapshot_30416.ckpt)
[portal018:20178] Restarting from file (ompi_global_snapshot_30416.ckpt)
[portal018:20178] Exec in self
--------------------------------------------------------------------------
mpirun could not find anything to do.
It is possible that you forgot to specify how many processes to run
via the "-np" argument.
--------------------------------------------------------------------------
Any help will be very appreciated.
Regards,
Raj
|