Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI and SGE
From: Ray Muno (muno_at_[hidden])
Date: 2009-06-23 14:11:05


Rolf Vandevaart wrote:

>>
>> PMGR_COLLECTIVE ERROR: unitialized MPI task: Missing required
>> environment variable: MPIRUN_RANK
>> PMGR_COLLECTIVE ERROR: PMGR_COLLECTIVE ERROR: unitialized MPI task:
>> Missing required environment variable: MPIRUN_RANK
>>
> I do not recognize these errors as part of Open MPI. A google search
> showed they might be coming from MVAPICH. Is there a chance we are
> using Open MPI to launch the jobs (via Open MPI mpirun) but we are
> actually launching an application that is linked to MVAPICH?
>
>
You are correct. I was trying to run the MVAPICH compiled test program.

With an OpenMPI compiled test, I do get an extra line of output with the
verbose flag. The program just hangs at that point.

[muno_at_compute-6-30 ~]$ which mpirun
/share/apps/opt/openmpi_pgi/bin/mpirun

[muno_at_compute-6-30 ~]$ldd a.out
        libmpi_f90.so.0 =>
/share/apps/opt/openmpi_pgi/lib/libmpi_f90.so.0 (0x00002aaaaaaad000)
        libmpi_f77.so.0 =>
/share/apps/opt/openmpi_pgi/lib/libmpi_f77.so.0 (0x00002aaaaacb0000)
        libmpi.so.0 => /share/apps/opt/openmpi_pgi/lib/libmpi.so.0
(0x00002aaaaaee0000)
...

 mpirun -np $NSLOTS -mca pls_gridengine_verbose 1 a.out
Starting server daemon at host "compute-6-25.local"
Starting server daemon at host "compute-1-1.local"
Server daemon successfully started with task id "1.compute-6-25"
error: commlib error: access denied (client IP resolved to host name "".
This is not identical to clients host name "")
error: executing task of job 12144 failed: failed sending task to
execd_at_compute-1-1.local: can't find connection
[compute-6-25.local:10810] ERROR: A daemon on node compute-1-1.local
failed to start as expected.
[compute-6-25.local:10810] ERROR: There may be more information
available from
[compute-6-25.local:10810] ERROR: the 'qstat -t' command on the Grid
Engine tasks.
[compute-6-25.local:10810] ERROR: If the problem persists, please
restart the
[compute-6-25.local:10810] ERROR: Grid Engine PE job
[compute-6-25.local:10810] ERROR: The daemon exited unexpectedly with
status 1.
Establishing /usr/bin/ssh session to host compute-6-25.local ...

-- 
 Ray Muno