Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Init failing in singleton
From: Grzegorz Maj (maju3_at_[hidden])
Date: 2010-07-07 12:12:15


The problem was that orted couldn't find ssh nor rsh on that machine.
I've added my installation to PATH and it now works.
So one question: I will definitely not use MPI_Comm_spawn or any
related stuff. Do I need this ssh? If not, is there any way to say
orted that it shouldn't be looking for ssh because it won't need it?

Regards,
Grzegorz Maj

2010/7/7 Ralph Castain <rhc_at_[hidden]>:
> Check your path and ld_library_path- looks like you are picking up some stale binary for orted and/or stale libraries (perhaps getting the default OMPI instead of 1.4.2) on the machine where it fails.
>
> On Jul 7, 2010, at 7:44 AM, Grzegorz Maj wrote:
>
>> Hi,
>> I was trying to run some MPI processes as a singletons. On some of the
>> machines they crash on MPI_Init. I use exactly the same binaries of my
>> application and the same installation of openmpi 1.4.2 on two machines
>> and it works on one of them and fails on the other one. This is the
>> command and its output (test is a simple application calling only
>> MPI_Init and MPI_Finalize):
>>
>> LD_LIBRARY_PATH=/home/gmaj/openmpi/lib ./test
>> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>> ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line 161
>> --------------------------------------------------------------------------
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>>  orte_plm_base_select failed
>>  --> Returned value Not found (-13) instead of ORTE_SUCCESS
>> --------------------------------------------------------------------------
>> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>> ../../orte/runtime/orte_init.c at line 132
>> --------------------------------------------------------------------------
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>>  orte_ess_set_name failed
>>  --> Returned value Not found (-13) instead of ORTE_SUCCESS
>> --------------------------------------------------------------------------
>> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>> ../../orte/orted/orted_main.c at line 323
>> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a
>> daemon on the local node in file
>> ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line
>> 381
>> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a
>> daemon on the local node in file
>> ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line
>> 143
>> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a
>> daemon on the local node in file ../../orte/runtime/orte_init.c at
>> line 132
>> --------------------------------------------------------------------------
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>>  orte_ess_set_name failed
>>  --> Returned value Unable to start a daemon on the local node (-128)
>> instead of ORTE_SUCCESS
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems.  This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>>  ompi_mpi_init: orte_init failed
>>  --> Returned "Unable to start a daemon on the local node" (-128)
>> instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> [host01:21865] Abort before MPI_INIT completed successfully; not able
>> to guarantee that all other processes were killed!
>>
>>
>> Any ideas on this?
>>
>> Thanks,
>> Grzegorz Maj
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>