Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge::tight-integration] slot scheduling and resources handling
From: Reuti (reuti_at_[hidden])
Date: 2010-05-25 05:32:44


Hi,

Am 25.05.2010 um 09:14 schrieb Eloi Gaudry:

> I do no reset any environment variable during job submission or job handling.
> Is there a simple way to check that openmpi is working as expected with SGE
> tight integration (as displaying environment variables, setting options on the
> command line, etc. ) ?

a) put a command:

env

in the jobscript and check the output for $JOB_ID and various $SGE_* variables.

b) to confirm the misbehavior: are the tasks on the slave nodes kids of sge_shepherd or any system sshd/rshd?

-- Reuti

> Regards,
> Eloi
>
>
> On Friday 21 May 2010 17:35:24 Reuti wrote:
>> Hi,
>>
>> Am 21.05.2010 um 17:19 schrieb Eloi Gaudry:
>>> Hi Reuti,
>>>
>>> Yes, the openmpi binaries used were build after having used the
>>> --with-sge during configure, and we only use those binaries on our
>>> cluster.
>>>
>>> [eg_at_moe:~]$ /opt/openmpi-1.3.3/bin/ompi_info
>>>
>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component
>>> v1.3.3)
>>
>> ok. As you have a Tight Integration as goal and set in your PE
>> "control_slaves TRUE", SGE wouldn't allow `qrsh -inherit ...` to nodes
>> which are not in the list of granted nodes. So it looks, like your job is
>> running outside of this Tight Integration with its own `rsh`or `ssh`.
>>
>> Do you reset $JOB_ID or other environment variables in your jobscript,
>> which could trigger Open MPI to assume that it's not running inside SGE?
>>
>> -- Reuti
>>
>>> On Friday 21 May 2010 16:01:54 Reuti wrote:
>>>> Hi,
>>>>
>>>> Am 21.05.2010 um 14:11 schrieb Eloi Gaudry:
>>>>> Hi there,
>>>>>
>>>>> I'm observing something strange on our cluster managed by SGE6.2u4 when
>>>>> launching a parallel computation on several nodes, using OpenMPI/SGE
>>>>> tight- integration mode (OpenMPI-1.3.3). It seems that the SGE
>>>>> allocated slots are not used by OpenMPI, as if OpenMPI was doing is
>>>>> own
>>>>> round-robin allocation based on the allocated node hostnames.
>>>>
>>>> you compiled Open MPI with --with-sge (and recompiled your
>>>> applications)? You are using the correct mpiexec?
>>>>
>>>> -- Reuti
>>>>
>>>>> Here is what I'm doing:
>>>>> - launch a parallel computation involving 8 processors, using for each
>>>>> of them 14GB of memory. I'm using a qsub command where i request
>>>>> memory_free resource and use tight integration with openmpi
>>>>> - 3 servers are available:
>>>>> . barney with 4 cores (4 slots) and 32GB
>>>>> . carl with 4 cores (4 slots) and 32GB
>>>>> . charlie with 8 cores (8 slots) and 64GB
>>>>>
>>>>> Here is the output of the allocated nodes (OpenMPI output):
>>>>> ====================== ALLOCATED NODES ======================
>>>>>
>>>>> Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2
>>>>>
>>>>> Daemon: [[44332,0],0] Daemon launched: True
>>>>> Num slots: 4 Slots in use: 0
>>>>> Num slots allocated: 4 Max slots: 0
>>>>> Username on node: NULL
>>>>> Num procs: 0 Next node_rank: 0
>>>>>
>>>>> Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2
>>>>>
>>>>> Daemon: Not defined Daemon launched: False
>>>>> Num slots: 2 Slots in use: 0
>>>>> Num slots allocated: 2 Max slots: 0
>>>>> Username on node: NULL
>>>>> Num procs: 0 Next node_rank: 0
>>>>>
>>>>> Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2
>>>>>
>>>>> Daemon: Not defined Daemon launched: False
>>>>> Num slots: 2 Slots in use: 0
>>>>> Num slots allocated: 2 Max slots: 0
>>>>> Username on node: NULL
>>>>> Num procs: 0 Next node_rank: 0
>>>>>
>>>>> =================================================================
>>>>>
>>>>> Here is what I see when my computation is running on the cluster:
>>>>> # rank pid hostname
>>>>>
>>>>> 0 28112 charlie
>>>>> 1 11417 carl
>>>>> 2 11808 barney
>>>>> 3 28113 charlie
>>>>> 4 11418 carl
>>>>> 5 11809 barney
>>>>> 6 28114 charlie
>>>>> 7 11419 carl
>>>>>
>>>>> Note that -the parallel environment used under SGE is defined as:
>>>>> [eg_at_moe:~]$ qconf -sp round_robin
>>>>> pe_name round_robin
>>>>> slots 32
>>>>> user_lists NONE
>>>>> xuser_lists NONE
>>>>> start_proc_args /bin/true
>>>>> stop_proc_args /bin/true
>>>>> allocation_rule $round_robin
>>>>> control_slaves TRUE
>>>>> job_is_first_task FALSE
>>>>> urgency_slots min
>>>>> accounting_summary FALSE
>>>>>
>>>>> I'm wondering why OpenMPI didn't use the allocated nodes chosen by SGE
>>>>> (cf. "ALLOCATED NODES" report) but instead allocate each job of the
>>>>> parallel computation at a time, using a round-robin method.
>>>>>
>>>>> Note that I'm using the '--bynode' option in the orterun command line.
>>>>> If the behavior I'm observing is simply the consequence of using this
>>>>> option, please let me know. This would eventually mean that one need
>>>>> to state that SGE tight- integration has a lower priority on orterun
>>>>> behavior than the different command line options.
>>>>>
>>>>> Any help would be appreciated,
>>>>> Thanks,
>>>>> Eloi
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
>
>
> Eloi Gaudry
>
> Free Field Technologies
> Axis Park Louvain-la-Neuve
> Rue Emile Francqui, 1
> B-1435 Mont-Saint Guibert
> BELGIUM
>
> Company Phone: +32 10 487 959
> Company Fax: +32 10 454 626