Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
From: Richard Bardwell (richard_at_[hidden])
Date: 2012-02-14 05:40:49


Jeff,

I wiped out all versions of openmpi on all the nodes including the distro installed version.
I reinstalled version 1.4.4 on all nodes.
I now get the error that libopen-rte.so.0 cannot be found when running mpiexec across
different nodes, even though the LD_LIBRARY_PATH for all nodes points to /usr/local/lib
where the file exists. Any ideas ?

Many Thanks

Richard

----- Original Message -----
From: "Jeff Squyres" <jsquyres_at_[hidden]>
To: "Open MPI Users" <users_at_[hidden]>
Sent: Monday, February 13, 2012 6:28 PM
Subject: Re: [OMPI users] MPI orte_init fails on remote nodes

> You might want to fully uninstall the disto-installed version of Open MPI on all the nodes (e.g., Red Hat may have installed a
> different version of Open MPI, and that version is being found in your $PATH before your custom-installedversion).
>
>
> On Feb 13, 2012, at 12:12 PM, Richard Bardwell wrote:
>
>> OK, 1.4.4 is happily installed on both machines. But, I now get a really
>> weird error when running on the 2 nodes. I get
>> Error: unknown option "--daemonize"
>> even though I am just running with -np 2 -hostfile test.hst
>>
>> The program runs fine on 2 cores if running locally on each node.
>>
>> Any ideas ??
>>
>> Thanks
>>
>> Richard
>> ----- Original Message ----- From: "Gustavo Correa" <gus_at_[hidden]>
>> To: "Open MPI Users" <users_at_[hidden]>
>> Sent: Monday, February 13, 2012 4:22 PM
>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
>>
>>
>>> On Feb 13, 2012, at 11:02 AM, Richard Bardwell wrote:
>>>> Ralph
>>>> I had done a make clean in the 1.2.8 directory if that is what you meant ?
>>>> Or do I need to do something else ?
>>>> I appreciate your help on this by the way ;-)
>>> Hi Richard
>>> You can install in a different directory, totally separate from 1.2.8.
>>> Create a new work directory [which is not the final installation directory, just work, say /tmp/openmpi/1.4.4/work].
>>> Launch the OpenMPI 1.4.4 configure script from this new work directory with the --prefix pointing to your desired installation
>>> directory [e.g. /home/richard/openmpi/1.4.4/].
>>> I am assuming this is NFS mounted on the nodes [if you have a cluster].
>>> [Check all options with 'configure --help'.]
>>> Then do make, make install.
>>> Finally set your PATH and LD_LIBRARY_PATH to point to the new installation directory,
>>> to prevent mixing with the old 1.2.8.
>>> I have a number of OpenMPI versions here, compiled with various compilers,
>>> and they coexist well this way.
>>> I hope this helps,
>>> Gus Correa
>>>> ----- Original Message -----
>>>> From: Ralph Castain
>>>> To: Open MPI Users
>>>> Sent: Monday, February 13, 2012 3:41 PM
>>>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
>>>> You need to clean out the old attempt - that is a stale file
>>>> Sent from my iPad
>>>> On Feb 13, 2012, at 7:36 AM, "Richard Bardwell" <richard_at_[hidden]> wrote:
>>>>> OK, I installed 1.4.4, rebuilt the exec and guess what ...... I now get some weird errors as below:
>>>>> mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_ras_dash_host
>>>>> along with a few other files
>>>>> even though the .so / .la files are all there !
>>>>> ----- Original Message -----
>>>>> From: Ralph Castain
>>>>> To: Open MPI Users
>>>>> Sent: Monday, February 13, 2012 2:59 PM
>>>>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
>>>>> Good heavens - where did you find something that old? Can you use a more recent version?
>>>>> Sent from my iPad
>>>>>
>>>>>> Gentlemen
>>>>>> I am struggling to get MPI working when the hostfile contains different nodes.
>>>>>> I get the error below. Any ideas ?? I can ssh without password between the two
>>>>>> nodes. I am running 1.2.8 MPI on both machines.
>>>>>> Any help most appreciated !!!!!
>>>>>> MPITEST/v8_mpi_test> mpiexec -n 2 --debug-daemons -hostfile test.hst /home/sharc/MPITEST/v8_mpi_test/mpitest
>>>>>> Daemon [0,0,1] checking in as pid 10490 on host 192.0.2.67
>>>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 182
>>>>>> --------------------------------------------------------------------------
>>>>>> It looks like orte_init failed for some reason; your parallel process is
>>>>>> likely to abort. There are many reasons that a parallel process can
>>>>>> fail during orte_init; some of which are due to configuration or
>>>>>> environment problems. This failure appears to be an internal failure;
>>>>>> here's some additional information (which may only be relevant to an
>>>>>> Open MPI developer):
>>>>>> orte_rml_base_select failed
>>>>>> --> Returned value -13 instead of ORTE_SUCCESS
>>>>>> --------------------------------------------------------------------------
>>>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_system_init.c at line 42
>>>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52
>>>>>> Open RTE was unable to initialize properly. The error occured while
>>>>>> attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS.
>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]
>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received kill_local_procs
>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275
>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1158
>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
>>>>>> [linux-tmpw:10489] ERROR: A daemon on node 192.0.2.68 failed to start as expected.
>>>>>> [linux-tmpw:10489] ERROR: There may be more information available from
>>>>>> [linux-tmpw:10489] ERROR: the remote shell (see above).
>>>>>> [linux-tmpw:10489] ERROR: The daemon exited unexpectedly with status 243.
>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]
>>>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received exit
>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188
>>>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1190
>>>>>> --------------------------------------------------------------------------
>>>>>> mpiexec was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.
>>>>>> --------------------------------------------------------------------------
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>