Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Adams, Samuel D Contr AFRL/HEDR (Samuel.Adams_at_[hidden])
Date: 2007-07-27 11:38:32


I deleted all of the entries out of the know_hosts file, but that didn't
seem to help. I can run jobs just fine without torque on multiple
nodes. I can also ssh to all nodes without using passwords, so I am not
sure what the deal is.

...

Okay, I found the problem. The keys that I had in know_hosts were for
only the hostname i.e. prodnode2; whereas, the hostname that torque was
using were fully qualified names i.e. prodnode2.brooks.af.mil and the
keys did not exist for the fully qualified names.

Thanks for the help.

Sam Adams
General Dynamics Information Technology
Phone: 210.536.5945

-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
Behalf Of George Bosilca
Sent: Friday, July 27, 2007 10:13 AM
To: Open MPI Users
Subject: Re: [OMPI users] torque and openmpi

The key is in the first line of the provided output. One of the
connection failed because a wrong ssh key. Clean your .ssh/
known_hosts and the problem will vanish.

   Thanks,
     george.

On Jul 27, 2007, at 11:01 AM, Adams, Samuel D Contr AFRL/HEDR wrote:

> When I run jobs with torque, I get this error message. Any ideas?
>
> [sam_at_prodnode1 all]$ cat script.sh.err
> Host key verification failed.
> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
> file
> base/pls_base_orted_cmds.c at line 275
> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
> file
> pls_rsh_module.c at line 1164
> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
> file
> errmgr_hnp.c at line 90
> [prodnode3.brooks.af.mil:03321] ERROR: A daemon on node
> prodnode2.brooks.af.mil failed to start as expected.
> [prodnode3.brooks.af.mil:03321] ERROR: There may be more information
> available from
> [prodnode3.brooks.af.mil:03321] ERROR: the remote shell (see above).
> [prodnode3.brooks.af.mil:03321] ERROR: The daemon exited unexpectedly
> with status 255.
> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
> file
> base/pls_base_orted_cmds.c at line 188
> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
> file
> pls_rsh_module.c at line 1196
> ----------------------------------------------------------------------

> --
> --
> mpirun was unable to cleanly terminate the daemons for this job.
> Returned value Timeout instead of ORTE_SUCCESS.
>
> ----------------------------------------------------------------------

> --
> --
>
> Sam Adams
> General Dynamics Information Technology
> Phone: 210.536.5945
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users