Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
From: Ralph Castain (rhc.openmpi_at_[hidden])
Date: 2012-02-13 09:59:25


Good heavens - where did you find something that old? Can you use a more recent version?

Sent from my iPad

On Feb 13, 2012, at 4:45 AM, "Richard Bardwell" <richard_at_[hidden]> wrote:

> Gentlemen
>
> I am struggling to get MPI working when the hostfile contains different nodes.
>
> I get the error below. Any ideas ?? I can ssh without password between the two
>
> nodes. I am running 1.2.8 MPI on both machines.
>
> Any help most appreciated !!!!!
>
>
>
> MPITEST/v8_mpi_test> mpiexec -n 2 --debug-daemons -hostfile test.hst /home/sharc/MPITEST/v8_mpi_test/mpitest
>
> Daemon [0,0,1] checking in as pid 10490 on host 192.0.2.67
>
> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 182
>
> --------------------------------------------------------------------------
>
> It looks like orte_init failed for some reason; your parallel process is
>
> likely to abort. There are many reasons that a parallel process can
>
> fail during orte_init; some of which are due to configuration or
>
> environment problems. This failure appears to be an internal failure;
>
> here's some additional information (which may only be relevant to an
>
> Open MPI developer):
>
> orte_rml_base_select failed
>
> --> Returned value -13 instead of ORTE_SUCCESS
>
> --------------------------------------------------------------------------
>
> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_system_init.c at line 42
>
> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52
>
> Open RTE was unable to initialize properly. The error occured while
>
> attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS.
>
> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]
>
> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received kill_local_procs
>
> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275
>
> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1158
>
> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
>
> [linux-tmpw:10489] ERROR: A daemon on node 192.0.2.68 failed to start as expected.
>
> [linux-tmpw:10489] ERROR: There may be more information available from
>
> [linux-tmpw:10489] ERROR: the remote shell (see above).
>
> [linux-tmpw:10489] ERROR: The daemon exited unexpectedly with status 243.
>
> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]
>
> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received exit
>
> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188
>
> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1190
>
> --------------------------------------------------------------------------
>
> mpiexec was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.
>
> --------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users