Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] [sge::tight-integration] slot scheduling and resources handling
From: Reuti (reuti_at_[hidden])
Date: 2010-05-25 05:32:44


Hi,

Am 25.05.2010 um 09:14 schrieb Eloi Gaudry:

> I do no reset any environment variable during job submission or job handling.
> Is there a simple way to check that openmpi is working as expected with SGE
> tight integration (as displaying environment variables, setting options on the
> command line, etc. ) ?

a) put a command:

env

in the jobscript and check the output for $JOB_ID and various $SGE_* variables.

b) to confirm the misbehavior: are the tasks on the slave nodes kids of sge_shepherd or any system sshd/rshd?

-- Reuti

> Regards,
> Eloi
>
>
> On Friday 21 May 2010 17:35:24 Reuti wrote:
>> Hi,
>>
>> Am 21.05.2010 um 17:19 schrieb Eloi Gaudry:
>>> Hi Reuti,
>>>
>>> Yes, the openmpi binaries used were build after having used the
>>> --with-sge during configure, and we only use those binaries on our
>>> cluster.
>>>
>>> [eg_at_moe:~]$ /opt/openmpi-1.3.3/bin/ompi_info
>>>
>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component
>>> v1.3.3)
>>
>> ok. As you have a Tight Integration as goal and set in your PE
>> "control_slaves TRUE", SGE wouldn't allow `qrsh -inherit ...` to nodes
>> which are not in the list of granted nodes. So it looks, like your job is
>> running outside of this Tight Integration with its own `rsh`or `ssh`.
>>
>> Do you reset $JOB_ID or other environment variables in your jobscript,
>> which could trigger Open MPI to assume that it's not running inside SGE?
>>
>> -- Reuti
>>
>>> On Friday 21 May 2010 16:01:54 Reuti wrote:
>>>> Hi,
>>>>
>>>> Am 21.05.2010 um 14:11 schrieb Eloi Gaudry:
>>>>> Hi there,
>>>>>
>>>>> I'm observing something strange on our cluster managed by SGE6.2u4 when
>>>>> launching a parallel computation on several nodes, using OpenMPI/SGE
>>>>> tight- integration mode (OpenMPI-1.3.3). It seems that the SGE
>>>>> allocated slots are not used by OpenMPI, as if OpenMPI was doing is
>>>>> own
>>>>> round-robin allocation based on the allocated node hostnames.
>>>>
>>>> you compiled Open MPI with --with-sge (and recompiled your
>>>> applications)? You are using the correct mpiexec?
>>>>
>>>> -- Reuti
>>>>
>>>>> Here is what I'm doing:
>>>>> - launch a parallel computation involving 8 processors, using for each
>>>>> of them 14GB of memory. I'm using a qsub command where i request
>>>>> memory_free resource and use tight integration with openmpi
>>>>> - 3 servers are available:
>>>>> . barney with 4 cores (4 slots) and 32GB
>>>>> . carl with 4 cores (4 slots) and 32GB
>>>>> . charlie with 8 cores (8 slots) and 64GB
>>>>>
>>>>> Here is the output of the allocated nodes (OpenMPI output):
>>>>> ====================== ALLOCATED NODES ======================
>>>>>
>>>>> Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2
>>>>>
>>>>> Daemon: [[44332,0],0] Daemon launched: True
>>>>> Num slots: 4 Slots in use: 0
>>>>> Num slots allocated: 4 Max slots: 0
>>>>> Username on node: NULL
>>>>> Num procs: 0 Next node_rank: 0
>>>>>
>>>>> Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2
>>>>>
>>>>> Daemon: Not defined Daemon launched: False
>>>>> Num slots: 2 Slots in use: 0
>>>>> Num slots allocated: 2 Max slots: 0
>>>>> Username on node: NULL
>>>>> Num procs: 0 Next node_rank: 0
>>>>>
>>>>> Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2
>>>>>
>>>>> Daemon: Not defined Daemon launched: False
>>>>> Num slots: 2 Slots in use: 0
>>>>> Num slots allocated: 2 Max slots: 0
>>>>> Username on node: NULL
>>>>> Num procs: 0 Next node_rank: 0
>>>>>
>>>>> =================================================================
>>>>>
>>>>> Here is what I see when my computation is running on the cluster:
>>>>> # rank pid hostname
>>>>>
>>>>> 0 28112 charlie
>>>>> 1 11417 carl
>>>>> 2 11808 barney
>>>>> 3 28113 charlie
>>>>> 4 11418 carl
>>>>> 5 11809 barney
>>>>> 6 28114 charlie
>>>>> 7 11419 carl
>>>>>
>>>>> Note that -the parallel environment used under SGE is defined as:
>>>>> [eg_at_moe:~]$ qconf -sp round_robin
>>>>> pe_name round_robin
>>>>> slots 32
>>>>> user_lists NONE
>>>>> xuser_lists NONE
>>>>> start_proc_args /bin/true
>>>>> stop_proc_args /bin/true
>>>>> allocation_rule $round_robin
>>>>> control_slaves TRUE
>>>>> job_is_first_task FALSE
>>>>> urgency_slots min
>>>>> accounting_summary FALSE
>>>>>
>>>>> I'm wondering why OpenMPI didn't use the allocated nodes chosen by SGE
>>>>> (cf. "ALLOCATED NODES" report) but instead allocate each job of the
>>>>> parallel computation at a time, using a round-robin method.
>>>>>
>>>>> Note that I'm using the '--bynode' option in the orterun command line.
>>>>> If the behavior I'm observing is simply the consequence of using this
>>>>> option, please let me know. This would eventually mean that one need
>>>>> to state that SGE tight- integration has a lower priority on orterun
>>>>> behavior than the different command line options.
>>>>>
>>>>> Any help would be appreciated,
>>>>> Thanks,
>>>>> Eloi
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
>
>
> Eloi Gaudry
>
> Free Field Technologies
> Axis Park Louvain-la-Neuve
> Rue Emile Francqui, 1
> B-1435 Mont-Saint Guibert
> BELGIUM
>
> Company Phone: +32 10 487 959
> Company Fax: +32 10 454 626