Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problems with gridengine integration on RHEL 6
From: Brian McNally (bmcnally_at_[hidden])
Date: 2012-02-15 20:00:11


Hi Dave,

I looked through the INSTALL, VERSION, NEWS, and README files in the
1.5.4 openmpi tarball but didn't see what you were referring to. Are you
suggesting that I launch mpirun similar to this?

    mpirun -mca plm ^rshd ...?

What I meant by "the same parallel environment setup" was that the PE in
SGE was defined the same way:

$ qconf -sp orte
pe_name orte
slots 9999
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE

Even though I have RHEL 5 and RHEL 6 nodes in the same cluster they
never run the same MPI job; it's always either all RHEL 5 nodes or all
RHEL 6.

--
Brian McNally
On 02/15/2012 04:08 PM, Reuti wrote:
> Am 16.02.2012 um 00:41 schrieb Dave Love:
>
>> Brian McNally<bmcnally_at_[hidden]>  writes:
>>
>>> Hello Open MPI community,
>>>
>>> I'm running the openmpi 1.5.3 package as provided by Redhat Enterprise
>>> Linux 6, along with SGE 6.2u3. I've discovered that under RHEL 5 orted
>>> gets spawned via qrsh and under RHEL 6 orted gets spanwed via
>>> SSH. This is happening in the same cluster environment with the same
>>> parallel environment setup. I want orted to get spawned via qrsh
>>> because we impose memory limits if a job is spawned through SSH.
>>
>> [I'd have thought you'd want qrsh to get tight integration regardless.]
>>
>>> I cannot determine WHY the behavior is different from RHEL 5 to RHEL
>>> 6. In the former I'm using the openmpi 1.4.3 package, in the latter
>>> I'm using openmpi 1.5.3. Both are supposedly built to support the
>>> gridengine ras.
>>
>> See the release notes for 1.5.4.  The workaround I was given is
>>   plm = ^rshd
>
> Aha, thx for reminding me of it - so it's still broken.
>
> -- Reuti
>
>
>> Does "the same parallel environment setup" mean mixing 1.4 and 1.5?  I
>> thought they weren't binary compatible.
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users