Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Torque+BCLR+OpenMPI
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2010-02-25 13:37:19


Anton,

I don't know if there usual or typical way of initiating a checkpoint amongst various resource managers. I know that the BLCR folks (I believe Eric Roman is heading this effort - CC'ed) have been investigating a tighter integration of Open MPI, BLCR and Torque. He might be able to give you a bit more guidance on this topic.

-- Josh

On Feb 10, 2010, at 11:54 PM, Anton Starikov wrote:

> Hi!
> I'm trying to implement checkpointing on out cluster, and I have obvious question.
>
> I guess this was implemented many times by other users, so I would like is someone share experience with me.
>
> With serial/multithreaded jobs it is kind of clear. But for parallel?
>
> We have "fat" 16-core nodes, so user use both OpenMP and MPI programs.
>
> Shell I just do perform some checks in my checkpointing script and call ompi-checkpoint if after tests I figure our that there is MPI job?
>
> What is "usual" way?
>
> Best,
>
> Anton
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users