Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] cluster checkpoint error
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-01-17 11:10:44


with what version of OMPI?

On Jan 17, 2014, at 3:23 AM, basma a.azeem <basmaabdelazeem_at_[hidden]> wrote:

>
>
> i am trying to run Blcr with Open mpi on a cluster of 4 nodes
> blcr version 0.8.5
> when i run the command :
>
> mpirun -np 4 -am ft-enable-cr -hostfile hosts /home/ubuntu//N/NPB3.3-MPI/bin/bt.B.4
>
>
> i got this error :
>
> -------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> opal_cr_init() failed failed
> --> Returned value -1 instead of OPAL_SUCCESS
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> opal_cr_init() failed failed
> --> Returned value -1 instead of OPAL_SUCCESS
> --------------------------------------------------------------------------
> [node002:02170] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file ../../../orte/runtime/orte_init.c at line 77
> [node001:02438] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file ../../../orte/runtime/orte_init.c at line 77
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> ompi_mpi_init: orte_init failed
> --> Returned "Error" (-1) instead of "Success" (0)
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> ompi_mpi_init: orte_init failed
> --> Returned "Error" (-1) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [node002:2170] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> [node001:2438] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> opal_cr_init() failed failed
> --> Returned value -1 instead of OPAL_SUCCESS
> --------------------------------------------------------------------------
> [node003:02173] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file ../../../orte/runtime/orte_init.c at line 77
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> ompi_mpi_init: orte_init failed
> --> Returned "Error" (-1) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [node003:2173] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
>
> ----------------------
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users