Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Gridengine + Open MPI
From: Pak Lui (Pak.Lui_at_[hidden])
Date: 2008-07-06 14:08:34


Romaric David wrote:
> Hello,
>
> I'm trying to use Open MPI with Sun Grid Engine 6.1.
>
> With Open MPI 1.2.6 or 1.2.7, Open MPI processes are perfectly started
> or killed by Sun Grid Engine.
> Suspend does not work (looks like a know issue
> http://www.open-mpi.org/community/lists/users/2007/03/2790.php):
> Has this issue finally been solved ?

It was fixed at one point in the trunk before v1.3 went official, but
while rolling the code from gridengine PLM into the rsh PLM code, this
feature was left out because there was some lingering issues that I
didn't resolved and I lost track of it. Sorry but thanks for bringing it
up, I will need to look at the issue again and reopen this ticket
against v1.3:

https://svn.open-mpi.org/trac/ompi/ticket/1099

>
> I then tried to use OpenMPI 1.3.x. When adding the --with-sge option at
> compile time, SGE
> pls does not get build, only SGE ras components. Thus openmpi jobs
> cannot start
> in Gridengine. Is this intentionnal that the pls SGE components are not
> built ?

For v1.3, you are right to point out that the --with-sge build flag is
required to get Open MPI to build with the SGE support. And only the
gridengine RAS will be built.

The new PLM in v1.3 series will include the rsh plm which will be
overloaded as the SGE parallel job launcher as well as the rsh/ssh
launcher.

Since there are a lot of overlaps in the functionalities between the 2
PLMs, it made sense to merge the gridengine plm into the rsh PLM for the
ease of maintainability and of troubleshooting. By setting the runtime
flag "--mca plm_rsh_disable_qrsh", it should allow user to disable the
SGE launcher and start the rsh/ssh method under SGE environment.

So even it is the rsh PLM that starts the parallel job under SGE, the
rsh PLM can detect if the Open MPI job is started under the SGE Parallel
Environment (via checking some SGE env vars) and use the "qrsh
--inherit" command to launch the parallel job the same way as it was
before. You can check by setting MCA to something like "--mca
plm_base_verbose 10" in your mpirun command and look for the launch
commands that mpirun uses.

>
> Regards,
> Romaric
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
- Pak Lui
pak.lui_at_[hidden]