Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] MPI_Init failing in singleton
From: Grzegorz Maj (maju3_at_[hidden])
Date: 2010-07-07 09:44:26


Hi,
I was trying to run some MPI processes as a singletons. On some of the
machines they crash on MPI_Init. I use exactly the same binaries of my
application and the same installation of openmpi 1.4.2 on two machines
and it works on one of them and fails on the other one. This is the
command and its output (test is a simple application calling only
MPI_Init and MPI_Finalize):

LD_LIBRARY_PATH=/home/gmaj/openmpi/lib ./test
[host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line 161
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_plm_base_select failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
../../orte/runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
../../orte/orted/orted_main.c at line 323
[host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a
daemon on the local node in file
../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line
381
[host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a
daemon on the local node in file
../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line
143
[host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a
daemon on the local node in file ../../orte/runtime/orte_init.c at
line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Unable to start a daemon on the local node (-128)
instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Unable to start a daemon on the local node" (-128)
instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[host01:21865] Abort before MPI_INIT completed successfully; not able
to guarantee that all other processes were killed!

Any ideas on this?

Thanks,
Grzegorz Maj