Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] FT problem
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-09-21 20:52:00


Not really - the person who wrote that code for his PhD thesis has since become a professor and rarely has time to respond on the mailing list, nor to maintain the code. So I'm afraid we don't have anyone who knows much about it any more.

I plan to rework the checkpoint support in upcoming months, but can't say when that will occur.

On Sep 21, 2013, at 7:51 AM, basma a.azeem <basmaabdelazeem_at_[hidden]> wrote:

> Any Suggestions
>
>
> From: basmaabdelazeem_at_[hidden]
> To: users_at_[hidden]
> Subject: FT problem
> Date: Wed, 18 Sep 2013 16:42:29 +0200
>
> i am using openmpi-1.6.1
> i need to try checkpoint restart ( self , blcr )
> after i installed openmpi i had the following in my installation folder :
>
> bin\ ompi-checkpoint
> bin\ompi-restart
>
> lib\openmpi\mca_crs_self.la
> lib\openmpi\mca_crs_self.so
> lib\openmpi\mca_crs_blcr.la
> lib\openmpi\mca_crs_blcr.so
>
> although i have:
>
> ompi_info | grep FT
> FT Checkpoint support: yes (checkpoint thread: yes)
>
> ompi_info | grep crs
> MCA crs: none (MCA v2.0, API v2.0, Component v1.6.1)
>
> when i try to use checkpoint it failed:
>
> basma_at_basma-Satellite-A500:~$ /OpenMP/openmpi-1.6.1/builddir/bin/mpirun -np 3 -am ft-enable-cr /home/basma/NPB3.3/NPB3.3/NPB3.3-OMP/bin/lu.A
>
>
> NAS Parallel Benchmarks (NPB3.3-OMP) - LU Benchmark
>
> Size: 64x 64x 64
> Iterations: 250
> Number of available threads: 4
>
> NAS Parallel Benchmarks (NPB3.3-OMP) - LU Benchmark
>
> Size: 64x 64x 64
> Iterations: 250
> Number of available threads: 4
>
> NAS Parallel Benchmarks (NPB3.3-OMP) - LU Benchmark
>
> Size: 64x 64x 64
> Iterations: 250
> Number of available threads: 4
>
> Time step 1
> Time step 1
> Time step 1
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 2917 on node basma-Satellite-A500 exited on signal 10 (User defined signal 1).
> --------------------------------------------------------------------------
> basma_at_basma-Satellite-A500:~$
>
> this resulted when i run this command from shell 2 :
> basma_at_basma-Satellite-A500:~$ /OpenMP/openmpi-1.6.1/builddir/bin/ompi-checkpoint 2916
>
> what i did wrong?
>
> thank you
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users