Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] MPI orte_init fails on remote nodes
From: Richard Bardwell (richard_at_[hidden])
Date: 2012-02-13 07:45:45


Gentlemen

I am struggling to get MPI working when the hostfile contains different nodes.

I get the error below. Any ideas ?? I can ssh without password between the two

nodes. I am running 1.2.8 MPI on both machines.

Any help most appreciated !!!!!



MPITEST/v8_mpi_test> mpiexec -n 2 --debug-daemons -hostfile test.hst /home/sharc/MPITEST/v8_mpi_test/mpitest

Daemon [0,0,1] checking in as pid 10490 on host 192.0.2.67

[linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 182

--------------------------------------------------------------------------

It looks like orte_init failed for some reason; your parallel process is

likely to abort. There are many reasons that a parallel process can

fail during orte_init; some of which are due to configuration or

environment problems. This failure appears to be an internal failure;

here's some additional information (which may only be relevant to an

Open MPI developer):

orte_rml_base_select failed

--> Returned value -13 instead of ORTE_SUCCESS

--------------------------------------------------------------------------

[linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_system_init.c at line 42

[linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52

Open RTE was unable to initialize properly. The error occured while

attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS.

[linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]

[linux-tmpw:10490] [0,0,1] orted_recv_pls: received kill_local_procs

[linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275

[linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1158

[linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90

[linux-tmpw:10489] ERROR: A daemon on node 192.0.2.68 failed to start as expected.

[linux-tmpw:10489] ERROR: There may be more information available from

[linux-tmpw:10489] ERROR: the remote shell (see above).

[linux-tmpw:10489] ERROR: The daemon exited unexpectedly with status 243.

[linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]

[linux-tmpw:10490] [0,0,1] orted_recv_pls: received exit

[linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188

[linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1190

--------------------------------------------------------------------------

mpiexec was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.

--------------------------------------------------------------------------