This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
Am 26.02.2010 um 15:01 schrieb Tobias Müller:
> I hope this list is the right place for my problem concerning OpenMPI
> with Sun Gridengine. I'm running OpenMPI with gridengine support:
> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.7)
> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.7)
> on 4 Debian Lenny system with Sun Gridengine 6.2. I've written a small
which update version of SGE?
> test program which only displays the hostname of each MPI process its
> running on and start this via a simple script with a submit by qsub:
> #$ -V
> ### number of processors and parallel environment
> #$ -pe sol 32
> ### Job name
> #$ -N "mpi_test"
> ### Start from current working directory
> #$ -cwd
> #$ -l arch=lx26-amd64
> /usr/bin/mpirun.openmpi --mca pls_gridengine_verbose 1 -v ~/grid/
> The gridengine starts the jobs, but fails with Host key verification
> failed. in the logfiles:
> local configuration sol2.XXX not defined - using global configuration
> Starting server daemon at host "sol2.XXX"
> Starting server daemon at host "sol3.XXX"
> Starting server daemon at host "sol4.XXX"
> Starting server daemon at host "sol1.XXX"
> Server daemon successfully started with task id "1.sol2"
> Server daemon successfully started with task id "1.sol4"
> Server daemon successfully started with task id "1.sol1"
> Server daemon successfully started with task id "1.sol3"
> Establishing /usr/bin/ssh session to host sol2.XXX ...
> Host key verification failed.
> /usr/bin/ssh exited with exit code 255
> reading exit code from shepherd ... 129
> [sol2:22892] ERROR: A daemon on node sol2.XXX failed to start as
> [sol2:22892] ERROR: There may be more information available from
> [sol2:22892] ERROR: the 'qstat -t' command on the Grid Engine tasks.
> [sol2:22892] ERROR: If the problem persists, please restart the
> [sol2:22892] ERROR: Grid Engine PE job
> [sol2:22892] ERROR: The daemon exited unexpectedly with status 129.
> The host keys for all 4 solX hosts are in the known_hosts file of the
> user submitting the job and of the known_hosts file of root.
You setup SGE to use SSH as remote startup method and it's working
otherwise for qrsh and qrsh with command? Can you try to the -
builtin- method as an alternative?
> Any hints why this could go wrong?
> users mailing list