Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question about checkpoint/restart protocol
From: Sergio Díaz (sdiaz_at_[hidden])
Date: 2009-11-05 04:38:45


Hi,

Did you load the BLCR modules before compiling OpenMPI?

Regards,
Sergio

Mohamed Adel escribió:
> Dear OMPI users,
>
> I'm a new OpenMPI user. I've configured openmpi-1.3.3 with those options "./configure --prefix=/home/mab/openmpi-1.3.3 --with-sge --enable-ft-thread --with-ft=cr --enable-mpi-threads --enable-static --disable-shared --with-blcr=/home/mab/blcr-0.8.2/" then compiled and installed it successfully.
> Now I'm trying to use the checkpoint/restart protocol. I run a program with the options "mpirun -n 2 -am ft-enable-cr -H localhost prime/checkpoint-restart-test" but I receive the following error:
>
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> [madel:28896] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> opal_cr_init() failed failed
> --> Returned value -1 instead of OPAL_SUCCESS
> --------------------------------------------------------------------------
> [madel:28896] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 77
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> ompi_mpi_init: orte_init failed
> --> Returned "Error" (-1) instead of "Success" (0)
> --------------------------------------------------------------------------
>
> I can't find the files mentioned in this post "http://www.open-mpi.org/community/lists/users/2009/09/10641.php" (mca_crs_blcr.so, mca_crs_blcr.la). Could you please help me with that error?
>
> Thanks in advance
> Mohamed Adel
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
Sergio Díaz Montes
Centro de Supercomputacion de Galicia
Avda. de Vigo. s/n (Campus Sur) 15706 Santiago de Compostela (Spain)
Tel: +34 981 56 98 10 ; Fax: +34 981 59 46 16
email: sdiaz_at_[hidden] ; http://www.cesga.es/
------------------------------------------------