>> Here are the allocation info retrieved from `qstat -g t` for the related job:
>
> For me the output of `qstat -g t` shows MASTER and SLAVE entries but no variables. Is there any wrapper defined for `qstat` to reformat the output (or a ~/.sge_qstat defined)?
>
> [eg: ] sorry, i forgot about sge_qstat being defined. As I don't have any slot available right now, I cannot relaunch the job to get the output updated.
Reuti, here is the output you asked two days ago.
It was produced with another "bad" run for which 3 processes are running on nodes charlie and carl... but we should have only 2 processes on carl and 4 on charlie...
Output from qstat -g t:
------------------------------------
queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
smp4.q@carl.fft BIP 0/2/4 1.14 lx-amd64
hc:mem_available=1.715G
1391 0.57643 semi_green jj r 04/05/2012 15:41:04 SLAVE
SLAVE
---------------------------------------------------------------------------------
smp8.q@charlie.fft BIP 0/4/8 1.73 lx-amd64
hc:mem_available=4.018G
1391 0.57643 semi_green jj r 04/05/2012 15:41:04 MASTER
SLAVE
SLAVE
SLAVE
SLAVE
Debug output from orterun:
------------------------------------
[charlie:08194] ras:gridengine: JOB_ID: 1391
[charlie:08194] ras:gridengine: PE_HOSTFILE: /opt/sge/default/spool/charlie/active_jobs/1391.1/pe_hostfile
[charlie:08194] ras:gridengine: charlie.fft: PE_HOSTFILE shows slots=4
[charlie:08194] ras:gridengine: carl.fft: PE_HOSTFILE shows slots=2
====================== ALLOCATED NODES ======================
Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2
Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
Daemon: [[57575,0],0] Daemon launched: True
Num slots: 4 Slots in use: 0
Num slots allocated: 4 Max slots: 0
Username on node: NULL
Num procs: 0 Next node_rank: 0
Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2
Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
Daemon: Not defined Daemon launched: False
Num slots: 2 Slots in use: 0
Num slots allocated: 2 Max slots: 0
Username on node: NULL
Num procs: 0 Next node_rank: 0
=================================================================
Map generated by mapping policy: 0200
Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE
Num new daemons: 1 New daemon starting vpid 1
Num nodes: 2
Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2
Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
Daemon: [[57575,0],0] Daemon launched: True
Num slots: 4 Slots in use: 3
Num slots allocated: 4 Max slots: 0
Username on node: NULL
Num procs: 3 Next node_rank: 3
Data for proc: [[57575,1],0]
Pid: 0 Local rank: 0 Node rank: 0
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[57575,1],2]
Pid: 0 Local rank: 1 Node rank: 1
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[57575,1],4]
Pid: 0 Local rank: 2 Node rank: 2
State: 0 App_context: 0 Slot list: NULL
Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2
Num boards: 1 Num sockets/board: 2 Num cores/socket: 4
Daemon: [[57575,0],1] Daemon launched: False
Num slots: 2 Slots in use: 3
Num slots allocated: 2 Max slots: 0
Username on node: NULL
Num procs: 3 Next node_rank: 3
Data for proc: [[57575,1],1]
Pid: 0 Local rank: 0 Node rank: 0
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[57575,1],3]
Pid: 0 Local rank: 1 Node rank: 1
State: 0 App_context: 0 Slot list: NULL
Data for proc: [[57575,1],5]
Pid: 0 Local rank: 2 Node rank: 2
State: 0 App_context: 0 Slot list: NULL
Regards,
Eloi