Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Restarting from a checkpoint (OMPI 1.3)
From: Gregor Dschung (gregor.dschung_at_[hidden])
Date: 2009-01-20 06:07:12


I'm trying the new released Open MPI 1.3 in conjunction with BLCR to
provide the checkpoint/restart-feature.

Configured with ./configure --prefix=/usr/local --with-ft=cr
--enable-ft-thread --enable-mpi-threads --with-blcr=/

A MPI-job on a single machine (several threads) is checkpointed and
restarted very well.

The checkpoint of a MPI-job across two hosts (ethernet, tcp) is also
done without warnings or errors (the homedir and the directory, where
the MPI-Application is, are shared with NFS). The restart works too, but
all threads are only started on the host, where I enter the ompi-restart
command. Even if I add the -hostfile argument to ompi-restart, only the
one host is used.

Does anybody has a hint?