I'm no SGE expert, but I do note that your original error indicates that mpirun was unable to find a launcher for your environment. When running under SGE, mpirun looks for certain environmental variables indicative of SGE. If it finds those, it then looks for the "qrsh" command. If it doesn't find "qrsh" and/or it isn't executable by the user, then you will fail with that error.

Given that you have the envars, is "qrsh" in your path where mpirun is executing? If not, then that is the reason why you are able to run outside of SGE (where mpirun will default to using ssh) and not inside it.

On Apr 16, 2011, at 5:21 PM, Derrick LIN wrote:

Well, does `mpiexec` point to the correct one?

I don't really get this. I only installed one and only one OpenMPI on the node. There shouldn't have another 'mpiexec' on the system.

It's worthy to mention that every node is deployed from a master image. So everything is exactly the same except IP and DNS name.

> I thought you compiled it on your own with --with-sge. What about: 

pwbcad@sgeqexec01:~$ ompi_info | grep grid
                 MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.4.1)

Is there any location I can find a more meaningful OpenMPI log?

I will try to install openmpi 1.4.3 and see if that works.

I want to confirm one more thing: does SGE's master host need to have OpenMPI installed? Is it relevant?

Many thanks Reuti

users mailing list