Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ompi-restart using different nodes
From: Jonathan Ferland (jonathan.ferland_at_[hidden])
Date: 2009-12-02 12:01:56

Hi Josh,

In case it help, I am running 1.3.3 compiled as follow :
 ../configure --enable-ft-thread --with-ft=cr --enable-mpi-threads
--with-blcr=... --with-blcr-libdir=...--disable-openib-rdmacm --prefix=....

I ran my application like this :
mpirun -am ft-enable-cr --hostfile host -np 2 ./a.out

where host contains:

This way it work if I checkpoint restart :
ompi-restart -hostfile host ompi_global_snapshot_....ckpt

but if I then change the host to (just swapping nodes):

then it crash...


Josh Hursey wrote:
> Though I do not test this scenario (using hostfiles) very often, it
> used to work. The ompi-restart command takes a --hostfile (or
> --machinefile) argument that is passed directly to the mpirun command.
> I wonder if something broke recently with this handoff. I can
> certainly checkpoint with one set of nodes/allocation and restart with
> another, but most/all of my testing occurs in a SLURM environment, so
> no need for an explicit hostfile.
> I'll take a look to see if I can reproduce, but probably will not be
> until next week.
> -- Josh
> On Dec 2, 2009, at 9:54 AM, Jonathan Ferland wrote:
>> Hi,
>> I am trying to use BLCR checkpointing in mpi. I am currently able to
>> run my application using some hostfile, checkpoint the run, and then
>> restart the application using the same hostfile. The thing I would
>> like to do is to restart the application with a different hostfile.
>> But this leads to a segfault using 1.3.3.
>> Is it possible to restart the application using a different hostfile
>> (we are using pbs to create the hostfile, so each new restart might
>> be on different nodes), how can we do that? If no, do you plan to
>> include this in a future release?
>> thanks
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]