Am 26.02.2010 um 15:01 schrieb Tobias Müller:
> I hope this list is the right place for my problem concerning OpenMPI
> with Sun Gridengine. I'm running OpenMPI with gridengine support:
> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.7)
> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.7)
> on 4 Debian Lenny system with Sun Gridengine 6.2. I've written a small
which update version of SGE?
> test program which only displays the hostname of each MPI process its
> running on and start this via a simple script with a submit by qsub:
> #$ -V
> ### number of processors and parallel environment
> #$ -pe sol 32
> ### Job name
> #$ -N "mpi_test"
> ### Start from current working directory
> #$ -cwd
> #$ -l arch=lx26-amd64
> /usr/bin/mpirun.openmpi --mca pls_gridengine_verbose 1 -v ~/grid/
> The gridengine starts the jobs, but fails with Host key verification
> failed. in the logfiles:
> local configuration sol2.XXX not defined - using global configuration
> Starting server daemon at host "sol2.XXX"
> Starting server daemon at host "sol3.XXX"
> Starting server daemon at host "sol4.XXX"
> Starting server daemon at host "sol1.XXX"
> Server daemon successfully started with task id "1.sol2"
> Server daemon successfully started with task id "1.sol4"
> Server daemon successfully started with task id "1.sol1"
> Server daemon successfully started with task id "1.sol3"
> Establishing /usr/bin/ssh session to host sol2.XXX ...
> Host key verification failed.
> /usr/bin/ssh exited with exit code 255
> reading exit code from shepherd ... 129
> [sol2:22892] ERROR: A daemon on node sol2.XXX failed to start as
> [sol2:22892] ERROR: There may be more information available from
> [sol2:22892] ERROR: the 'qstat -t' command on the Grid Engine tasks.
> [sol2:22892] ERROR: If the problem persists, please restart the
> [sol2:22892] ERROR: Grid Engine PE job
> [sol2:22892] ERROR: The daemon exited unexpectedly with status 129.
> The host keys for all 4 solX hosts are in the known_hosts file of the
> user submitting the job and of the known_hosts file of root.
You setup SGE to use SSH as remote startup method and it's working
otherwise for qrsh and qrsh with command? Can you try to the -
builtin- method as an alternative?
> Any hints why this could go wrong?
> users mailing list