Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge::tight-integration] slot scheduling and resources handling
From: Reuti (reuti_at_[hidden])
Date: 2010-05-21 11:35:24


Hi,

Am 21.05.2010 um 17:19 schrieb Eloi Gaudry:

> Hi Reuti,
>
> Yes, the openmpi binaries used were build after having used the --with-sge
> during configure, and we only use those binaries on our cluster.
>
> [eg_at_moe:~]$ /opt/openmpi-1.3.3/bin/ompi_info

> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3.3)

ok. As you have a Tight Integration as goal and set in your PE "control_slaves TRUE", SGE wouldn't allow `qrsh -inherit ...` to nodes which are not in the list of granted nodes. So it looks, like your job is running outside of this Tight Integration with its own `rsh`or `ssh`.

Do you reset $JOB_ID or other environment variables in your jobscript, which could trigger Open MPI to assume that it's not running inside SGE?

-- Reuti

>
>
> On Friday 21 May 2010 16:01:54 Reuti wrote:
>> Hi,
>>
>> Am 21.05.2010 um 14:11 schrieb Eloi Gaudry:
>>> Hi there,
>>>
>>> I'm observing something strange on our cluster managed by SGE6.2u4 when
>>> launching a parallel computation on several nodes, using OpenMPI/SGE
>>> tight- integration mode (OpenMPI-1.3.3). It seems that the SGE allocated
>>> slots are not used by OpenMPI, as if OpenMPI was doing is own
>>> round-robin allocation based on the allocated node hostnames.
>>
>> you compiled Open MPI with --with-sge (and recompiled your applications)?
>> You are using the correct mpiexec?
>>
>> -- Reuti
>>
>>> Here is what I'm doing:
>>> - launch a parallel computation involving 8 processors, using for each of
>>> them 14GB of memory. I'm using a qsub command where i request
>>> memory_free resource and use tight integration with openmpi
>>> - 3 servers are available:
>>> . barney with 4 cores (4 slots) and 32GB
>>> . carl with 4 cores (4 slots) and 32GB
>>> . charlie with 8 cores (8 slots) and 64GB
>>>
>>> Here is the output of the allocated nodes (OpenMPI output):
>>> ====================== ALLOCATED NODES ======================
>>>
>>> Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2
>>>
>>> Daemon: [[44332,0],0] Daemon launched: True
>>> Num slots: 4 Slots in use: 0
>>> Num slots allocated: 4 Max slots: 0
>>> Username on node: NULL
>>> Num procs: 0 Next node_rank: 0
>>>
>>> Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2
>>>
>>> Daemon: Not defined Daemon launched: False
>>> Num slots: 2 Slots in use: 0
>>> Num slots allocated: 2 Max slots: 0
>>> Username on node: NULL
>>> Num procs: 0 Next node_rank: 0
>>>
>>> Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2
>>>
>>> Daemon: Not defined Daemon launched: False
>>> Num slots: 2 Slots in use: 0
>>> Num slots allocated: 2 Max slots: 0
>>> Username on node: NULL
>>> Num procs: 0 Next node_rank: 0
>>>
>>> =================================================================
>>>
>>> Here is what I see when my computation is running on the cluster:
>>> # rank pid hostname
>>>
>>> 0 28112 charlie
>>> 1 11417 carl
>>> 2 11808 barney
>>> 3 28113 charlie
>>> 4 11418 carl
>>> 5 11809 barney
>>> 6 28114 charlie
>>> 7 11419 carl
>>>
>>> Note that -the parallel environment used under SGE is defined as:
>>> [eg_at_moe:~]$ qconf -sp round_robin
>>> pe_name round_robin
>>> slots 32
>>> user_lists NONE
>>> xuser_lists NONE
>>> start_proc_args /bin/true
>>> stop_proc_args /bin/true
>>> allocation_rule $round_robin
>>> control_slaves TRUE
>>> job_is_first_task FALSE
>>> urgency_slots min
>>> accounting_summary FALSE
>>>
>>> I'm wondering why OpenMPI didn't use the allocated nodes chosen by SGE
>>> (cf. "ALLOCATED NODES" report) but instead allocate each job of the
>>> parallel computation at a time, using a round-robin method.
>>>
>>> Note that I'm using the '--bynode' option in the orterun command line. If
>>> the behavior I'm observing is simply the consequence of using this
>>> option, please let me know. This would eventually mean that one need to
>>> state that SGE tight- integration has a lower priority on orterun
>>> behavior than the different command line options.
>>>
>>> Any help would be appreciated,
>>> Thanks,
>>> Eloi
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
>
>
> Eloi Gaudry
>
> Free Field Technologies
> Axis Park Louvain-la-Neuve
> Rue Emile Francqui, 1
> B-1435 Mont-Saint Guibert
> BELGIUM
>
> Company Phone: +32 10 487 959
> Company Fax: +32 10 454 626