Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
From: Richard Bardwell (richard_at_[hidden])
Date: 2012-02-13 12:12:46


OK, 1.4.4 is happily installed on both machines. But, I now get a really
weird error when running on the 2 nodes. I get
Error: unknown option "--daemonize"
even though I am just running with -np 2 -hostfile test.hst

The program runs fine on 2 cores if running locally on each node.

Any ideas ??

Thanks

Richard
----- Original Message -----
From: "Gustavo Correa" <gus_at_[hidden]>
To: "Open MPI Users" <users_at_[hidden]>
Sent: Monday, February 13, 2012 4:22 PM
Subject: Re: [OMPI users] MPI orte_init fails on remote nodes

>
> On Feb 13, 2012, at 11:02 AM, Richard Bardwell wrote:
>
>> Ralph
>>
>> I had done a make clean in the 1.2.8 directory if that is what you meant ?
>> Or do I need to do something else ?
>>
>> I appreciate your help on this by the way ;-)
>
> Hi Richard
>
> You can install in a different directory, totally separate from 1.2.8.
>
> Create a new work directory [which is not the final installation directory, just work,
> say /tmp/openmpi/1.4.4/work].
> Launch the OpenMPI 1.4.4 configure script from this new work directory
> with the --prefix pointing to your desired
> installation directory [e.g. /home/richard/openmpi/1.4.4/].
> I am assuming this is NFS mounted on the nodes [if you have a cluster].
> [Check all options with 'configure --help'.]
> Then do make, make install.
> Finally set your PATH and LD_LIBRARY_PATH to point to the new installation directory,
> to prevent mixing with the old 1.2.8.
>
> I have a number of OpenMPI versions here, compiled with various compilers,
> and they coexist well this way.
>
> I hope this helps,
> Gus Correa
>
>>
>>
>> ----- Original Message -----
>> From: Ralph Castain
>> To: Open MPI Users
>> Sent: Monday, February 13, 2012 3:41 PM
>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
>>
>> You need to clean out the old attempt - that is a stale file
>>
>> Sent from my iPad
>>
>> On Feb 13, 2012, at 7:36 AM, "Richard Bardwell" <richard_at_[hidden]> wrote:
>>
>>> OK, I installed 1.4.4, rebuilt the exec and guess what ...... I now get some weird errors as below:
>>> mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_ras_dash_host
>>> along with a few other files
>>> even though the .so / .la files are all there !
>>> ----- Original Message -----
>>> From: Ralph Castain
>>> To: Open MPI Users
>>> Sent: Monday, February 13, 2012 2:59 PM
>>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
>>>
>>> Good heavens - where did you find something that old? Can you use a more recent version?
>>>
>>> Sent from my iPad
>>>
>>>
>>>
>>>> Gentlemen
>>>>
>>>> I am struggling to get MPI working when the hostfile contains different nodes.
>>>>
>>>> I get the error below. Any ideas ?? I can ssh without password between the two
>>>>
>>>> nodes. I am running 1.2.8 MPI on both machines.
>>>>
>>>> Any help most appreciated !!!!!
>>>>
>>>>
>>>> MPITEST/v8_mpi_test> mpiexec -n 2 --debug-daemons -hostfile test.hst /home/sharc/MPITEST/v8_mpi_test/mpitest
>>>>
>>>> Daemon [0,0,1] checking in as pid 10490 on host 192.0.2.67
>>>>
>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 182
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> It looks like orte_init failed for some reason; your parallel process is
>>>>
>>>> likely to abort. There are many reasons that a parallel process can
>>>>
>>>> fail during orte_init; some of which are due to configuration or
>>>>
>>>> environment problems. This failure appears to be an internal failure;
>>>>
>>>> here's some additional information (which may only be relevant to an
>>>>
>>>> Open MPI developer):
>>>>
>>>> orte_rml_base_select failed
>>>>
>>>> --> Returned value -13 instead of ORTE_SUCCESS
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_system_init.c at line 42
>>>>
>>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52
>>>>
>>>> Open RTE was unable to initialize properly. The error occured while
>>>>
>>>> attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS.
>>>>
>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]
>>>>
>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received kill_local_procs
>>>>
>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275
>>>>
>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1158
>>>>
>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
>>>>
>>>> [linux-tmpw:10489] ERROR: A daemon on node 192.0.2.68 failed to start as expected.
>>>>
>>>> [linux-tmpw:10489] ERROR: There may be more information available from
>>>>
>>>> [linux-tmpw:10489] ERROR: the remote shell (see above).
>>>>
>>>> [linux-tmpw:10489] ERROR: The daemon exited unexpectedly with status 243.
>>>>
>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]
>>>>
>>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received exit
>>>>
>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188
>>>>
>>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1190
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> mpiexec was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>