Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] ompi-restart using different nodes
From: Jonathan Ferland (jonathan.ferland_at_[hidden])
Date: 2009-12-08 13:39:43


I did the same test using 1.3.4 and still the same issue.... I also
tried to use the tm interface instead of specifying the hostfile, same
result.

thanks,

Jonathan

Josh Hursey wrote:
> Though I do not test this scenario (using hostfiles) very often, it
> used to work. The ompi-restart command takes a --hostfile (or
> --machinefile) argument that is passed directly to the mpirun command.
> I wonder if something broke recently with this handoff. I can
> certainly checkpoint with one set of nodes/allocation and restart with
> another, but most/all of my testing occurs in a SLURM environment, so
> no need for an explicit hostfile.
>
> I'll take a look to see if I can reproduce, but probably will not be
> until next week.
>
> -- Josh
>
> On Dec 2, 2009, at 9:54 AM, Jonathan Ferland wrote:
>
>> Hi,
>>
>> I am trying to use BLCR checkpointing in mpi. I am currently able to
>> run my application using some hostfile, checkpoint the run, and then
>> restart the application using the same hostfile. The thing I would
>> like to do is to restart the application with a different hostfile.
>> But this leads to a segfault using 1.3.3.
>>
>> Is it possible to restart the application using a different hostfile
>> (we are using pbs to create the hostfile, so each new restart might
>> be on different nodes), how can we do that? If no, do you plan to
>> include this in a future release?
>>
>> thanks
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
--------------------------------------------------------------
Jonathan Ferland, analyste en calcul scientifique
RQCHP (Réseau québécois de calcul de haute performance)
bureau S-252, pavillon Roger-Gaudry, Université de Montréal
téléphone   : 514 343-6111 poste 8852
télécopieur : 514 343-2155
--------------------------------------------------------------