Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] OpenMPI with Sun Gridengine: Host key verification failed.
From: Tobias Müller (Tobias_Mueller_at_[hidden])
Date: 2010-02-26 09:01:34


Hi everybody!

I hope this list is the right place for my problem concerning OpenMPI
with Sun Gridengine. I'm running OpenMPI with gridengine support:

MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.7)
MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.7)

on 4 Debian Lenny system with Sun Gridengine 6.2. I've written a small
test program which only displays the hostname of each MPI process its
running on and start this via a simple script with a submit by qsub:

#!/bin/bash
#$ -V
### number of processors and parallel environment
#$ -pe sol 32
### Job name
#$ -N "mpi_test"
### Start from current working directory
#$ -cwd
#$ -l arch=lx26-amd64
/usr/bin/mpirun.openmpi --mca pls_gridengine_verbose 1 -v ~/grid/mpi_test/main

The gridengine starts the jobs, but fails with Host key verification
failed. in the logfiles:

local configuration sol2.XXX not defined - using global configuration
Starting server daemon at host "sol2.XXX"
Starting server daemon at host "sol3.XXX"
Starting server daemon at host "sol4.XXX"
Starting server daemon at host "sol1.XXX"
Server daemon successfully started with task id "1.sol2"
Server daemon successfully started with task id "1.sol4"
Server daemon successfully started with task id "1.sol1"
Server daemon successfully started with task id "1.sol3"
Establishing /usr/bin/ssh session to host sol2.XXX ...
Host key verification failed.
/usr/bin/ssh exited with exit code 255
reading exit code from shepherd ... 129
[sol2:22892] ERROR: A daemon on node sol2.XXX failed to start as expected.
[sol2:22892] ERROR: There may be more information available from
[sol2:22892] ERROR: the 'qstat -t' command on the Grid Engine tasks.
[sol2:22892] ERROR: If the problem persists, please restart the
[sol2:22892] ERROR: Grid Engine PE job
[sol2:22892] ERROR: The daemon exited unexpectedly with status 129.
...

The host keys for all 4 solX hosts are in the known_hosts file of the
user submitting the job and of the known_hosts file of root.

Any hints why this could go wrong?

Regards
  Tobias