Hei there
I have some questions regarding checkpoint/restart:
1. Until recently I thought that ompi-restart and ompi-restart are used to checkpoint a process inside an MPI application. Now I reread this and I realized that actually what it does is to checkpoint the mpirun process. Does this mean that if I run my application with multiple processes and on multiple nodes in my network the checkpoint file will contain the states of all the processes of my MPI application?
2. Can I restart the application on a different node?
Thanks a lot,
Andreea