So the ompi-checkpoint command connects with the Global Coordinator in
the SnapC 'full' component. The Global Coordinator lives in the HNP
(mpirun/orterun) as determined by the 'full' component. As a result to
start a checkpoint ompi-checkpoint must connect to the HNP.
From a user standpoint, they are typically running ompi-checkpoint
from the same machine where they started mpirun. So it made the most
sense to have these two connect to each other, especially if we ask
the user to provide the PID of the mpirun process to checkpoint.
That being said, with the proper changes to 'full' (or with a new
SnapC component), ompi-checkpoint could issue the checkpoint request
to any process in the MPI job [orterun, orted, application processes]
and have the correct things happen.
I have received one request for this functionality, but have not had
the time yet to dig into it.
Does that help?
On Jan 31, 2008, at 9:51 AM, Leonardo Fialho wrote:
> Hi all (and Josh),
> Why the ompi-checkpoint have to contact the HNP specifically? If I use
> another process to start the snapshot coordinator, apparently it´s
> works fine, no?
> PS: I prefer to send this message to the list... to keep it on the
> history for further use...
> Leonardo Fialho
> Computer Architecture and Operating Systems Department - CAOS
> Universidad Autonoma de Barcelona - UAB
> ETSE, Edifcio Q, QC/3088
> Phone: +34-93-581-2888
> Fax: +34-93-581-2478
> devel mailing list