Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Question about checkpoint/restart protocol
From: Mohamed Adel (Mohamed.Adel_at_[hidden])
Date: 2009-11-04 03:51:11


Dear OMPI users,

I'm a new OpenMPI user. I've configured openmpi-1.3.3 with those options "./configure --prefix=/home/mab/openmpi-1.3.3 --with-sge --enable-ft-thread --with-ft=cr --enable-mpi-threads --enable-static --disable-shared --with-blcr=/home/mab/blcr-0.8.2/" then compiled and installed it successfully.
Now I'm trying to use the checkpoint/restart protocol. I run a program with the options "mpirun -n 2 -am ft-enable-cr -H localhost prime/checkpoint-restart-test" but I receive the following error:

*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[madel:28896] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_cr_init() failed failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
[madel:28896] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 77
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------

I can't find the files mentioned in this post "http://www.open-mpi.org/community/lists/users/2009/09/10641.php" (mca_crs_blcr.so, mca_crs_blcr.la). Could you please help me with that error?

Thanks in advance
Mohamed Adel