Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI with Sun Gridengine: Host key verification failed.
From: Reuti (reuti_at_[hidden])
Date: 2010-02-26 09:26:04


Hi,

Am 26.02.2010 um 15:01 schrieb Tobias Müller:

> I hope this list is the right place for my problem concerning OpenMPI
> with Sun Gridengine. I'm running OpenMPI with gridengine support:
>
> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.7)
> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.7)
>
> on 4 Debian Lenny system with Sun Gridengine 6.2. I've written a small

which update version of SGE?

> test program which only displays the hostname of each MPI process its
> running on and start this via a simple script with a submit by qsub:
>
> #!/bin/bash
> #$ -V
> ### number of processors and parallel environment
> #$ -pe sol 32
> ### Job name
> #$ -N "mpi_test"
> ### Start from current working directory
> #$ -cwd
> #$ -l arch=lx26-amd64
> /usr/bin/mpirun.openmpi --mca pls_gridengine_verbose 1 -v ~/grid/
> mpi_test/main
>
> The gridengine starts the jobs, but fails with Host key verification
> failed. in the logfiles:
>
> local configuration sol2.XXX not defined - using global configuration
> Starting server daemon at host "sol2.XXX"
> Starting server daemon at host "sol3.XXX"
> Starting server daemon at host "sol4.XXX"
> Starting server daemon at host "sol1.XXX"
> Server daemon successfully started with task id "1.sol2"
> Server daemon successfully started with task id "1.sol4"
> Server daemon successfully started with task id "1.sol1"
> Server daemon successfully started with task id "1.sol3"
> Establishing /usr/bin/ssh session to host sol2.XXX ...
> Host key verification failed.
> /usr/bin/ssh exited with exit code 255
> reading exit code from shepherd ... 129
> [sol2:22892] ERROR: A daemon on node sol2.XXX failed to start as
> expected.
> [sol2:22892] ERROR: There may be more information available from
> [sol2:22892] ERROR: the 'qstat -t' command on the Grid Engine tasks.
> [sol2:22892] ERROR: If the problem persists, please restart the
> [sol2:22892] ERROR: Grid Engine PE job
> [sol2:22892] ERROR: The daemon exited unexpectedly with status 129.
> ...
>
> The host keys for all 4 solX hosts are in the known_hosts file of the
> user submitting the job and of the known_hosts file of root.

You setup SGE to use SSH as remote startup method and it's working
otherwise for qrsh and qrsh with command? Can you try to the -
builtin- method as an alternative?

-- Reuti

> Any hints why this could go wrong?
>
> Regards
> Tobias
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users