Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Galen Shipman (gshipman_at_[hidden])
Date: 2007-07-27 14:48:18


On Jul 27, 2007, at 12:23 PM, Adams, Samuel D Contr AFRL/HEDR wrote:

> I set up ompi before I configured Torque. Do I need to recompile ompi
> with appropriate torque configure options to get better integration?
>

If libtorque wasn't present on the machine at configure then yes, you
need to run:

./configure --with-tm=<path>

> Sam Adams
> General Dynamics Information Technology
> Phone: 210.536.5945
>
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_open-
> mpi.org] On
> Behalf Of Jeff Squyres
> Sent: Friday, July 27, 2007 12:14 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] torque and openmpi
>
> Are you not using the built-in OMPI support for Torque? The ssh keys
> should be irrelevant if using the TM API in Torque (i.e., OMPI won't
> be using ssh to launch remote processes; we use the internal TM API
> in Torque).
>
>
> On Jul 27, 2007, at 11:38 AM, Adams, Samuel D Contr AFRL/HEDR wrote:
>
>> I deleted all of the entries out of the know_hosts file, but that
>> didn't
>> seem to help. I can run jobs just fine without torque on multiple
>> nodes. I can also ssh to all nodes without using passwords, so I
>> am not
>> sure what the deal is.
>>
>> ...
>>
>> Okay, I found the problem. The keys that I had in know_hosts were
>> for
>> only the hostname i.e. prodnode2; whereas, the hostname that torque
>> was
>> using were fully qualified names i.e. prodnode2.brooks.af.mil and the
>> keys did not exist for the fully qualified names.
>>
>> Thanks for the help.
>>
>> Sam Adams
>> General Dynamics Information Technology
>> Phone: 210.536.5945
>>
>> -----Original Message-----
>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_open-
>> mpi.org] On
>> Behalf Of George Bosilca
>> Sent: Friday, July 27, 2007 10:13 AM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] torque and openmpi
>>
>> The key is in the first line of the provided output. One of the
>> connection failed because a wrong ssh key. Clean your .ssh/
>> known_hosts and the problem will vanish.
>>
>> Thanks,
>> george.
>>
>> On Jul 27, 2007, at 11:01 AM, Adams, Samuel D Contr AFRL/HEDR wrote:
>>
>>> When I run jobs with torque, I get this error message. Any ideas?
>>>
>>> [sam_at_prodnode1 all]$ cat script.sh.err
>>> Host key verification failed.
>>> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
>>> file
>>> base/pls_base_orted_cmds.c at line 275
>>> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
>>> file
>>> pls_rsh_module.c at line 1164
>>> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
>>> file
>>> errmgr_hnp.c at line 90
>>> [prodnode3.brooks.af.mil:03321] ERROR: A daemon on node
>>> prodnode2.brooks.af.mil failed to start as expected.
>>> [prodnode3.brooks.af.mil:03321] ERROR: There may be more information
>>> available from
>>> [prodnode3.brooks.af.mil:03321] ERROR: the remote shell (see above).
>>> [prodnode3.brooks.af.mil:03321] ERROR: The daemon exited
>>> unexpectedly
>>> with status 255.
>>> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
>>> file
>>> base/pls_base_orted_cmds.c at line 188
>>> [prodnode3.brooks.af.mil:03321] [0,0,0] ORTE_ERROR_LOG: Timeout in
>>> file
>>> pls_rsh_module.c at line 1196
>>> --------------------------------------------------------------------
>>> -
>
>>> -
>>
>>> --
>>> --
>>> mpirun was unable to cleanly terminate the daemons for this job.
>>> Returned value Timeout instead of ORTE_SUCCESS.
>>>
>>> --------------------------------------------------------------------
>>> -
>
>>> -
>>
>>> --
>>> --
>>>
>>> Sam Adams
>>> General Dynamics Information Technology
>>> Phone: 210.536.5945
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users