Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] FW: FT problem
From: basma a.azeem (basmaabdelazeem_at_[hidden])
Date: 2013-09-21 10:51:50


Any Suggestions


From: basmaabdelazeem_at_[hidden]
To: users_at_[hidden]
Subject: FT problem
Date: Wed, 18 Sep 2013 16:42:29 +0200

i am using openmpi-1.6.1
i need to try checkpoint restart ( self , blcr )
after i installed openmpi i had the following in my installation folder :

bin\ ompi-checkpoint
bin\ompi-restart

        
        
        
        

lib\openmpi\mca_crs_self.la

lib\openmpi\mca_crs_self.so
lib\openmpi\mca_crs_blcr.la
lib\openmpi\mca_crs_blcr.so
although i have:

        
        
        
        

ompi_info |
grep FT

  FT Checkpoint support: yes
(checkpoint thread: yes)


ompi_info | grep
crs

                MCA crs: none (MCA
v2.0, API v2.0, Component v1.6.1)


when i try to use checkpoint it failed:

basma_at_basma-Satellite-A500:~$ /OpenMP/openmpi-1.6.1/builddir/bin/mpirun -np 3 -am ft-enable-cr /home/basma/NPB3.3/NPB3.3/NPB3.3-OMP/bin/lu.A


 NAS Parallel Benchmarks (NPB3.3-OMP) - LU Benchmark

 Size: 64x 64x 64
 Iterations: 250
 Number of available threads: 4

 NAS Parallel Benchmarks (NPB3.3-OMP) - LU Benchmark

 Size: 64x 64x 64
 Iterations: 250
 Number of available threads: 4

 NAS Parallel Benchmarks (NPB3.3-OMP) - LU Benchmark

 Size: 64x 64x 64
 Iterations: 250
 Number of available threads: 4

 Time step 1
 Time step 1
 Time step 1
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 2917 on node basma-Satellite-A500 exited on signal 10 (User defined signal 1).
--------------------------------------------------------------------------
basma_at_basma-Satellite-A500:~$

this resulted when i run this command from shell 2 :
basma_at_basma-Satellite-A500:~$ /OpenMP/openmpi-1.6.1/builddir/bin/ompi-checkpoint 2916

what i did wrong?

thank you