You need to clean out the old attempt - that is a stale file

Sent from my iPad

On Feb 13, 2012, at 7:36 AM, "Richard Bardwell" <richard@sharc.co.uk> wrote:

OK, I installed 1.4.4, rebuilt the exec and guess what ...... I now get some weird errors as below:
mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_ras_dash_host
along with a few other files
even though the .so / .la files are all there !
----- Original Message -----
From: Ralph Castain
To: Open MPI Users
Sent: Monday, February 13, 2012 2:59 PM
Subject: Re: [OMPI users] MPI orte_init fails on remote nodes

Good heavens - where did you find something that old? Can you use a more recent version?

Sent from my iPad


 

Gentlemen

I am struggling to get MPI working when the hostfile contains different nodes.

I get the error below. Any ideas ?? I can ssh without password between the two

nodes. I am running 1.2.8 MPI on both machines.

Any help most appreciated !!!!!

 

MPITEST/v8_mpi_test> mpiexec -n 2 --debug-daemons -hostfile test.hst /home/sharc/MPITEST/v8_mpi_test/mpitest

Daemon [0,0,1] checking in as pid 10490 on host 192.0.2.67

[linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 182

--------------------------------------------------------------------------

It looks like orte_init failed for some reason; your parallel process is

likely to abort. There are many reasons that a parallel process can

fail during orte_init; some of which are due to configuration or

environment problems. This failure appears to be an internal failure;

here's some additional information (which may only be relevant to an

Open MPI developer):

orte_rml_base_select failed

--> Returned value -13 instead of ORTE_SUCCESS

--------------------------------------------------------------------------

[linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_system_init.c at line 42

[linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 52

Open RTE was unable to initialize properly. The error occured while

attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS.

[linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]

[linux-tmpw:10490] [0,0,1] orted_recv_pls: received kill_local_procs

[linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275

[linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1158

[linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90

[linux-tmpw:10489] ERROR: A daemon on node 192.0.2.68 failed to start as expected.

[linux-tmpw:10489] ERROR: There may be more information available from

[linux-tmpw:10489] ERROR: the remote shell (see above).

[linux-tmpw:10489] ERROR: The daemon exited unexpectedly with status 243.

[linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]

[linux-tmpw:10490] [0,0,1] orted_recv_pls: received exit

[linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188

[linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1190

--------------------------------------------------------------------------

mpiexec was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.

--------------------------------------------------------------------------

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users