Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] sge tight intregration leads to bad allocation
From: Reuti (reuti_at_[hidden])
Date: 2012-04-03 11:12:40


Am 03.04.2012 um 16:59 schrieb Eloi Gaudry:

> Hi Reuti,
>
> I configured OpenMPI to support SGE tight integration and used the defined below PE for submitting the job:
>
> [16:36][eg_at_moe:~]$ qconf -sp fill_up
> pe_name fill_up
> slots 80
> user_lists NONE
> xuser_lists NONE
> start_proc_args /bin/true
> stop_proc_args /bin/true
> allocation_rule $fill_up

It should fill a host completely before moving to the next one with this definition.

> control_slaves TRUE
> job_is_first_task FALSE
> urgency_slots min
> accounting_summary FALSE
>
> Here are the allocation info retrieved from `qstat -g t` for the related job:

For me the output of `qstat -g t` shows MASTER and SLAVE entries but no variables. Is there any wrapper defined for `qstat` to reformat the output (or a ~/.sge_qstat defined)?

And why is "num_proc=0" output everywhere - was it redefined (usually it's a load sensor set to the found cores in the machines and shoudn't be touched by hand making it a consumable complex).

-- Reuti

> ---------------------------------------------------------------------------------
> smp4.q_at_barney.fft BIP 0/1/4 0.70 lx-amd64
> hc:num_proc=0
> hl:mem_free=31.215G
> hl:mem_used=280.996M
> hc:mem_available=1.715G
> 1296 0.54786 semi_direc jj r 04/03/2012 16:43:49 1
> ---------------------------------------------------------------------------------
> smp4.q_at_carl.fft BIP 0/1/4 0.69 lx-amd64
> hc:num_proc=0
> hl:mem_free=30.764G
> hl:mem_used=742.805M
> hc:mem_available=1.715G
> 1296 0.54786 semi_direc jj r 04/03/2012 16:43:49 1
> ---------------------------------------------------------------------------------
> smp8.q_at_charlie.fft BIP 0/2/8 0.57 lx-amd64
> hc:num_proc=0
> hl:mem_free=62.234G
> hl:mem_used=836.797M
> hc:mem_available=4.018G
> 1296 0.54786 semi_direc jj r 04/03/2012 16:43:49 2
> ---------------------------------------------------------------------------------
>
> Sge reports whatr pls_gridengine_report does, i.e. what was reserved.
> But here is the ouput of the current job (after started by openmpi):
> [charlie:05294] ras:gridengine: JOB_ID: 1296
> [charlie:05294] ras:gridengine: PE_HOSTFILE: /opt/sge/default/spool/charlie/active_jobs/1296.1/pe_hostfile
> [charlie:05294] ras:gridengine: charlie.fft: PE_HOSTFILE shows slots=2
> [charlie:05294] ras:gridengine: barney.fft: PE_HOSTFILE shows slots=1
> [charlie:05294] ras:gridengine: carl.fft: PE_HOSTFILE shows slots=1
>
> ====================== ALLOCATED NODES ======================
>
> Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2
> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
> Daemon: [[54347,0],0] Daemon launched: True
> Num slots: 2 Slots in use: 0
> Num slots allocated: 2 Max slots: 0
> Username on node: NULL
> Num procs: 0 Next node_rank: 0
> Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2
> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
> Daemon: Not defined Daemon launched: False
> Num slots: 1 Slots in use: 0
> Num slots allocated: 1 Max slots: 0
> Username on node: NULL
> Num procs: 0 Next node_rank: 0
> Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2
> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
> Daemon: Not defined Daemon launched: False
> Num slots: 1 Slots in use: 0
> Num slots allocated: 1 Max slots: 0
> Username on node: NULL
> Num procs: 0 Next node_rank: 0
>
> =================================================================
>
> Map generated by mapping policy: 0200
> Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE
> Num new daemons: 2 New daemon starting vpid 1
> Num nodes: 3
>
> Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2
> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
> Daemon: [[54347,0],0] Daemon launched: True
> Num slots: 2 Slots in use: 2
> Num slots allocated: 2 Max slots: 0
> Username on node: NULL
> Num procs: 2 Next node_rank: 2
> Data for proc: [[54347,1],0]
> Pid: 0 Local rank: 0 Node rank: 0
> State: 0 App_context: 0 Slot list: NULL
> Data for proc: [[54347,1],3]
> Pid: 0 Local rank: 1 Node rank: 1
> State: 0 App_context: 0 Slot list: NULL
> Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2
> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
> Daemon: [[54347,0],1] Daemon launched: False
> Num slots: 1 Slots in use: 1
> Num slots allocated: 1 Max slots: 0
> Username on node: NULL
> Num procs: 1 Next node_rank: 1
> Data for proc: [[54347,1],1]
> Pid: 0 Local rank: 0 Node rank: 0
> State: 0 App_context: 0 Slot list: NULL
>
> Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2
> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
> Daemon: [[54347,0],2] Daemon launched: False
> Num slots: 1 Slots in use: 1
> Num slots allocated: 1 Max slots: 0
> Username on node: NULL
> Num procs: 1 Next node_rank: 1
> Data for proc: [[54347,1],2]
> Pid: 0 Local rank: 0 Node rank: 0
> State: 0 App_context: 0 Slot list: NULL
>
> Regards,
> Eloi
>
>
>
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Reuti
> Sent: mardi 3 avril 2012 16:24
> To: Open MPI Users
> Subject: Re: [OMPI users] sge tight intregration leads to bad allocation
>
> Hi,
>
> Am 03.04.2012 um 16:12 schrieb Eloi Gaudry:
>
>> Thanks for your feedback.
>> No, this is the other way around, the "reserved" slots on all nodes are ok but the "used" slots are different.
>>
>> Basically, I'm using SGE to schedule and book resources for a distributed job. When the job is finally launched, it uses a different allocation than the one that was reported by pls_gridengine_info.
>>
>> pls_grid_engine_info report states that 3 nodes were booked: barney (1 slot), carl (1 slot) and charlie (2 slots). This booking was done by sge depending on the memory requirements of the job (among others).
>>
>> When orterun starts the jobs (i.e. when Sge finally start the scheduled job), it uses 3 nodes but the first one (barney: 2 slots instead of 1) is oversubscribed and the last one (carl: 1 slot instead of 2) is underused.
>
> you configured Open MPI to support SGE tight integration and used a PE for submitting the job? Can you please post the defintion of the PE.
>
> What was the allocation you saw in SGE's `qstat -g t ` for the job?
>
> -- Reuti
>
>
>> If you need further information, please let me know.
>>
>> Eloi
>>
>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Ralph Castain
>> Sent: mardi 3 avril 2012 15:58
>> To: Open MPI Users
>> Subject: Re: [OMPI users] sge tight intregration leads to bad allocation
>>
>> I'm afraid there isn't enough info here to help. Are you saying you only allocated one slot/node, so the two slots on charlie is in error?
>>
>> Sent from my iPad
>>
>> On Apr 3, 2012, at 6:23 AM, "Eloi Gaudry" <eloi.gaudry_at_[hidden]> wrote:
>>
>> Hi,
>>
>> I've observed a strange behavior during rank allocation on a distributed run schedule and submitted using Sge (Son of Grid Egine 8.0.0d) and OpenMPI-1.4.4.
>> Briefly, there is a one-slot difference between allocated rank/slot for Sge and OpenMPI. The issue here is that one node becomes oversubscribed at runtime.
>>
>> Here is the output of the allocation done for gridengine:
>>
>> ====================== ALLOCATED NODES ======================
>>
>> Data for node: Name: barney Launch id: -1 Arch: ffc91200 State: 2
>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 2
>> Daemon: [[22904,0],0] Daemon launched: True
>> Num slots: 1 Slots in use: 0
>> Num slots allocated: 1 Max slots: 0
>> Username on node: NULL
>> Num procs: 0 Next node_rank: 0
>> Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2
>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 2
>> Daemon: Not defined Daemon launched: False
>> Num slots: 1 Slots in use: 0
>> Num slots allocated: 1 Max slots: 0
>> Username on node: NULL
>> Num procs: 0 Next node_rank: 0
>> Data for node: Name: charlie.fft Launch id: -1 Arch: 0 State: 2
>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 2
>> Daemon: Not defined Daemon launched: False
>> Num slots: 2 Slots in use: 0
>> Num slots allocated: 2 Max slots: 0
>> Username on node: NULL
>> Num procs: 0 Next node_rank: 0
>>
>>
>> And here is the allocation finally used:
>> =================================================================
>>
>> Map generated by mapping policy: 0200
>> Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE
>> Num new daemons: 2 New daemon starting vpid 1
>> Num nodes: 3
>>
>> Data for node: Name: barney Launch id: -1 Arch: ffc91200 State: 2
>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 2
>> Daemon: [[22904,0],0] Daemon launched: True
>> Num slots: 1 Slots in use: 2
>> Num slots allocated: 1 Max slots: 0
>> Username on node: NULL
>> Num procs: 2 Next node_rank: 2
>> Data for proc: [[22904,1],0]
>> Pid: 0 Local rank: 0 Node rank: 0
>> State: 0 App_context: 0 Slot list: NULL
>> Data for proc: [[22904,1],3]
>> Pid: 0 Local rank: 1 Node rank: 1
>> State: 0 App_context: 0 Slot list: NULL
>>
>> Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2
>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 2
>> Daemon: [[22904,0],1] Daemon launched: False
>> Num slots: 1 Slots in use: 1
>> Num slots allocated: 1 Max slots: 0
>> Username on node: NULL
>> Num procs: 1 Next node_rank: 1
>> Data for proc: [[22904,1],1]
>> Pid: 0 Local rank: 0 Node rank: 0
>> State: 0 App_context: 0 Slot list: NULL
>>
>> Data for node: Name: charlie.fft Launch id: -1 Arch: 0 State: 2
>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 2
>> Daemon: [[22904,0],2] Daemon launched: False
>> Num slots: 2 Slots in use: 1
>> Num slots allocated: 2 Max slots: 0
>> Username on node: NULL
>> Num procs: 1 Next node_rank: 1
>> Data for proc: [[22904,1],2]
>> Pid: 0 Local rank: 0 Node rank: 0
>> State: 0 App_context: 0 Slot list: NULL
>>
>> Has anyone already encounter the same behavior ?
>> Is there a simple fix than not using the tight integration mode between Sge and OpenMPI ?
>>
>> Eloi
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users