I have trouble with the Gridengine integration of openmpi. When I run
a job with only 4 processes, it runs fine. With more processes, mpirun
sometimes fails to connect to the remote nodes, the qrsh calls fail.
I'll attach a job script and the error output. As you can see from the
'for' loop, I can connect to all nodes just fine, it is the qrsh
executed by mpirun that fails. Qrsh was configured to run ssh with
kerberos authentification (ssh -tt -o GSSAPIDelegateCredentials=no).
My versions are openmpi 1.2.2, SGE 6.0u9, RHEL5. Any idea where the
problem could be?
Regards, Götz Waschk
AL I:40: Do what thou wilt shall be the whole of the Law.