Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Torque+BCLR+OpenMPI
From: Anton Starikov (ant.starikov_at_[hidden])
Date: 2010-02-11 02:54:35

I'm trying to implement checkpointing on out cluster, and I have obvious question.

I guess this was implemented many times by other users, so I would like is someone share experience with me.

With serial/multithreaded jobs it is kind of clear. But for parallel?

We have "fat" 16-core nodes, so user use both OpenMP and MPI programs.

Shell I just do perform some checks in my checkpointing script and call ompi-checkpoint if after tests I figure our that there is MPI job?

What is "usual" way?