Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Init failing in singleton
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-07-07 13:13:44


On Jul 7, 2010, at 10:12 AM, Grzegorz Maj wrote:

> The problem was that orted couldn't find ssh nor rsh on that machine.
> I've added my installation to PATH and it now works.
> So one question: I will definitely not use MPI_Comm_spawn or any
> related stuff. Do I need this ssh? If not, is there any way to say
> orted that it shouldn't be looking for ssh because it won't need it?

That's an interesting question - never faced that situation before. At the moment, the answer is "no". However, I could conjure up a patch that lets the orted not select a plm module....

>
> Regards,
> Grzegorz Maj
>
> 2010/7/7 Ralph Castain <rhc_at_[hidden]>:
>> Check your path and ld_library_path- looks like you are picking up some stale binary for orted and/or stale libraries (perhaps getting the default OMPI instead of 1.4.2) on the machine where it fails.
>>
>> On Jul 7, 2010, at 7:44 AM, Grzegorz Maj wrote:
>>
>>> Hi,
>>> I was trying to run some MPI processes as a singletons. On some of the
>>> machines they crash on MPI_Init. I use exactly the same binaries of my
>>> application and the same installation of openmpi 1.4.2 on two machines
>>> and it works on one of them and fails on the other one. This is the
>>> command and its output (test is a simple application calling only
>>> MPI_Init and MPI_Finalize):
>>>
>>> LD_LIBRARY_PATH=/home/gmaj/openmpi/lib ./test
>>> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>>> ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line 161
>>> --------------------------------------------------------------------------
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems. This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>> orte_plm_base_select failed
>>> --> Returned value Not found (-13) instead of ORTE_SUCCESS
>>> --------------------------------------------------------------------------
>>> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>>> ../../orte/runtime/orte_init.c at line 132
>>> --------------------------------------------------------------------------
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems. This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>> orte_ess_set_name failed
>>> --> Returned value Not found (-13) instead of ORTE_SUCCESS
>>> --------------------------------------------------------------------------
>>> [host01:21866] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
>>> ../../orte/orted/orted_main.c at line 323
>>> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a
>>> daemon on the local node in file
>>> ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line
>>> 381
>>> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a
>>> daemon on the local node in file
>>> ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line
>>> 143
>>> [host01:21865] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a
>>> daemon on the local node in file ../../orte/runtime/orte_init.c at
>>> line 132
>>> --------------------------------------------------------------------------
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems. This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>> orte_ess_set_name failed
>>> --> Returned value Unable to start a daemon on the local node (-128)
>>> instead of ORTE_SUCCESS
>>> --------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or environment
>>> problems. This failure appears to be an internal failure; here's some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>>
>>> ompi_mpi_init: orte_init failed
>>> --> Returned "Unable to start a daemon on the local node" (-128)
>>> instead of "Success" (0)
>>> --------------------------------------------------------------------------
>>> *** An error occurred in MPI_Init
>>> *** before MPI was initialized
>>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>> [host01:21865] Abort before MPI_INIT completed successfully; not able
>>> to guarantee that all other processes were killed!
>>>
>>>
>>> Any ideas on this?
>>>
>>> Thanks,
>>> Grzegorz Maj
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users