Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] [sge::tight-integration] slot scheduling and resources handling
From: Eloi Gaudry (eg_at_[hidden])
Date: 2010-05-21 08:11:41


Hi there,

I'm observing something strange on our cluster managed by SGE6.2u4 when
launching a parallel computation on several nodes, using OpenMPI/SGE tight-
integration mode (OpenMPI-1.3.3). It seems that the SGE allocated slots are
not used by OpenMPI, as if OpenMPI was doing is own round-robin allocation
based on the allocated node hostnames.

Here is what I'm doing:
- launch a parallel computation involving 8 processors, using for each of them
14GB of memory. I'm using a qsub command where i request memory_free resource
and use tight integration with openmpi
- 3 servers are available:
. barney with 4 cores (4 slots) and 32GB
. carl with 4 cores (4 slots) and 32GB
. charlie with 8 cores (8 slots) and 64GB

Here is the output of the allocated nodes (OpenMPI output):
====================== ALLOCATED NODES ======================

 Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2
  Daemon: [[44332,0],0] Daemon launched: True
  Num slots: 4 Slots in use: 0
  Num slots allocated: 4 Max slots: 0
  Username on node: NULL
  Num procs: 0 Next node_rank: 0
 Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2
  Daemon: Not defined Daemon launched: False
  Num slots: 2 Slots in use: 0
  Num slots allocated: 2 Max slots: 0
  Username on node: NULL
  Num procs: 0 Next node_rank: 0
 Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2
  Daemon: Not defined Daemon launched: False
  Num slots: 2 Slots in use: 0
  Num slots allocated: 2 Max slots: 0
  Username on node: NULL
  Num procs: 0 Next node_rank: 0

=================================================================

Here is what I see when my computation is running on the cluster:
# rank pid hostname
         0 28112 charlie
         1 11417 carl
         2 11808 barney
         3 28113 charlie
         4 11418 carl
         5 11809 barney
         6 28114 charlie
         7 11419 carl

Note that -the parallel environment used under SGE is defined as:
[eg_at_moe:~]$ qconf -sp round_robin
pe_name round_robin
slots 32
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE

I'm wondering why OpenMPI didn't use the allocated nodes chosen by SGE (cf.
"ALLOCATED NODES" report) but instead allocate each job of the parallel
computation at a time, using a round-robin method.

Note that I'm using the '--bynode' option in the orterun command line. If the
behavior I'm observing is simply the consequence of using this option, please
let me know. This would eventually mean that one need to state that SGE tight-
integration has a lower priority on orterun behavior than the different command
line options.

Any help would be appreciated,
Thanks,
Eloi

-- 
Eloi Gaudry
Free Field Technologies
Axis Park Louvain-la-Neuve
Rue Emile Francqui, 1
B-1435 Mont-Saint Guibert
BELGIUM
Company Phone: +32 10 487 959
Company Fax:   +32 10 454 626