Though I do not test this scenario (using hostfiles) very often, it
used to work. The ompi-restart command takes a --hostfile (or --
machinefile) argument that is passed directly to the mpirun command. I
wonder if something broke recently with this handoff. I can certainly
checkpoint with one set of nodes/allocation and restart with another,
but most/all of my testing occurs in a SLURM environment, so no need
for an explicit hostfile.
I'll take a look to see if I can reproduce, but probably will not be
until next week.
On Dec 2, 2009, at 9:54 AM, Jonathan Ferland wrote:
> I am trying to use BLCR checkpointing in mpi. I am currently able to
> run my application using some hostfile, checkpoint the run, and then
> restart the application using the same hostfile. The thing I would
> like to do is to restart the application with a different hostfile.
> But this leads to a segfault using 1.3.3.
> Is it possible to restart the application using a different hostfile
> (we are using pbs to create the hostfile, so each new restart might
> be on different nodes), how can we do that? If no, do you plan to
> include this in a future release?
> users mailing list