Rolf Vandevaart wrote:
> Ray Muno wrote:
>> Ray Muno wrote:
>>> We are running a cluster using Rocks 5.0 and OpenMPI 1.2 (primarily).
>>> Scheduling is done through SGE. MPI communication is over InfiniBand.
>> We also have OpenMPI 1.3 installed and receive similar errors.-
> This does sound like a problem with SGE. By default, we use qrsh to
> start the jobs on all the remote nodes. I believe that is the command
> that is failing. There are two things you can try to get more info
> depending on the version of Open MPI. With version 1.2, you can try
> this to get more information.
> |--mca pls_gridengine_verbose 1|
This did not look like it gave me any more info.
> With Open MPI 1.3.2 and later the verbose flag will not help. But
> instead, you can disable the use of qrsh and instead use rsh/ssh to
> start the remote jobs.
> --mca plm_rsh_disable_qrsh 1
Tha give me
PMGR_COLLECTIVE ERROR: unitialized MPI task: Missing required
environment variable: MPIRUN_RANK
PMGR_COLLECTIVE ERROR: PMGR_COLLECTIVE ERROR: unitialized MPI task:
Missing required environment variable: MPIRUN_RANK
University of Minnesota