Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ompi-restart failed && ompi-migrate
From: kidd (q19860103_at_[hidden])
Date: 2012-04-17 11:12:59


Hello ,thank your reply,but I still can't  Ompi-Restart  Multiple-Node. I checked my Node(ubuntu11.04  && openmpi1.5.5), they did not install the  prelink. Whether there are other reasons failed to ompi-restart? ps: if Ompi-Restart  Multiple-Node can be successful. Can start in another new node, rather than the original node? example : checkpoint (node1,node2) ,then  restart (node1,node3) ________________________________ 寄件者: Josh Hursey <jjhursey_at_[hidden]> 收件者: Open MPI Users <users_at_[hidden]> 寄件日期: 2012/4/11 (週三) 8:36 PM 主旨: Re: [OMPI users] ompi-restart failed && ompi-migrate The 1.5 series does not support process migration, so there is no ompi-migrate option there. This was only contributed to the trunk (1.7 series). However, changes to the runtime environment over the past few months have broken this functionality. It is currently unclear when this will be repaired. We hope to have it fixed and functional again before the first release of the 1.7 series. As far as your problem with ompi-restart have you checked the prelink option on all of your nodes, per:   https://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#prelink -- Josh On Tue, Apr 10, 2012 at 11:14 PM, kidd <q19860103_at_[hidden]> wrote: > Hello ! > I had some  problems . > This is My environment >    BLCR= 0.8.4   , openMPI= 1.5.5  , OS= ubuntu 11.04 >    I have 2 Node : cuda05(Master ,it have NFS  file system)  , cuda07(slave > ,mount Master) > >    I had also set  ~/.openmpi/mca-params.conf-> >      crs_base_snapshot_dir=/root/kidd_openMPI/Tmp >      snapc_base_global_snapshot_dir=/root/kidd_openMPI/checkpoints > >   my configure format= > ./configure --prefix=/root/kidd_openMPI --with-ft=cr --enable-ft-thread >  --with-blcr=/usr/local/BLCR  --with-blcr-libdir=/usr/local/BLCR/lib > --enable-mpirun-prefix-by-default >  --enable-static --enable-shared  --enable-opal-multi-threads; > > problem 1:  ompi-restart  on multiple Node >   command 01: mpirun -hostfile  Hosts -am ft-enable-cr  -x  LD_LIBRARY_PATH > -np 2  ./TEST >   command 02: ompi-restart  ompi_global_snapshot_2892.ckpt >       -> I can checkpoint 2 process on multiples nodes ,but when I restart > ,it can only restart on Master-Node. > >      command 03 : ompi-restart  -hostfile Hosts > ompi_global_snapshot_2892.ckpt >     ->Error Message .   I make sure BLCR  is OK. > #################################################################### > > -------------------------------------------------------------------------- >     root_at_cuda05:~/kidd_openMPI/checkpoints# ompi-restart -hostfile Hosts > ompi_global_snapshot_2892.ckpt/ > > -------------------------------------------------------------------------- >    Error: BLCR was not able to restart the process because exec failed. >             Check the installation of BLCR on all of the machines in your >        system. The following information may be of help: >  Return Code : -1 >  BLCR Restart Command : cr_restart >  Restart Command Line : cr_restart > /root/kidd_openMPI/checkpoints/ompi_global_snapshot_2892.ckpt/0/ opal_snapshot_1.ckpt/ompi_blcr_context.2704 > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > Error: Unable to obtain the proper restart command to restart from the >        checkpoint file (opal_snapshot_1.ckpt). Returned -1. >        Check the installation of the blcr checkpoint/restart service >        on all of the machines in your system.essage > ################################################################ >  problem 2: ompi-migrate i can't find .   How to use ompi-migrate ? > >   Please help me , thanks . > > > _______________________________________________ > users mailing list > users_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey _______________________________________________ users mailing list users_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/users