Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ompi-restart using different nodes
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2009-12-02 11:52:32


Though I do not test this scenario (using hostfiles) very often, it
used to work. The ompi-restart command takes a --hostfile (or --
machinefile) argument that is passed directly to the mpirun command. I
wonder if something broke recently with this handoff. I can certainly
checkpoint with one set of nodes/allocation and restart with another,
but most/all of my testing occurs in a SLURM environment, so no need
for an explicit hostfile.

I'll take a look to see if I can reproduce, but probably will not be
until next week.

-- Josh

On Dec 2, 2009, at 9:54 AM, Jonathan Ferland wrote:

> Hi,
>
> I am trying to use BLCR checkpointing in mpi. I am currently able to
> run my application using some hostfile, checkpoint the run, and then
> restart the application using the same hostfile. The thing I would
> like to do is to restart the application with a different hostfile.
> But this leads to a segfault using 1.3.3.
>
> Is it possible to restart the application using a different hostfile
> (we are using pbs to create the hostfile, so each new restart might
> be on different nodes), how can we do that? If no, do you plan to
> include this in a future release?
>
> thanks
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users