Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge::tight-integration] slot scheduling and resources handling
From: Eloi Gaudry (eg_at_[hidden])
Date: 2010-05-25 03:14:17


Hi Reuti,

I do no reset any environment variable during job submission or job handling.
Is there a simple way to check that openmpi is working as expected with SGE
tight integration (as displaying environment variables, setting options on the
command line, etc. ) ?

Regards,
Eloi

On Friday 21 May 2010 17:35:24 Reuti wrote:
> Hi,
>
> Am 21.05.2010 um 17:19 schrieb Eloi Gaudry:
> > Hi Reuti,
> >
> > Yes, the openmpi binaries used were build after having used the
> > --with-sge during configure, and we only use those binaries on our
> > cluster.
> >
> > [eg_at_moe:~]$ /opt/openmpi-1.3.3/bin/ompi_info
> >
> > MCA ras: gridengine (MCA v2.0, API v2.0, Component
> > v1.3.3)
>
> ok. As you have a Tight Integration as goal and set in your PE
> "control_slaves TRUE", SGE wouldn't allow `qrsh -inherit ...` to nodes
> which are not in the list of granted nodes. So it looks, like your job is
> running outside of this Tight Integration with its own `rsh`or `ssh`.
>
> Do you reset $JOB_ID or other environment variables in your jobscript,
> which could trigger Open MPI to assume that it's not running inside SGE?
>
> -- Reuti
>
> > On Friday 21 May 2010 16:01:54 Reuti wrote:
> >> Hi,
> >>
> >> Am 21.05.2010 um 14:11 schrieb Eloi Gaudry:
> >>> Hi there,
> >>>
> >>> I'm observing something strange on our cluster managed by SGE6.2u4 when
> >>> launching a parallel computation on several nodes, using OpenMPI/SGE
> >>> tight- integration mode (OpenMPI-1.3.3). It seems that the SGE
> >>> allocated slots are not used by OpenMPI, as if OpenMPI was doing is
> >>> own
> >>> round-robin allocation based on the allocated node hostnames.
> >>
> >> you compiled Open MPI with --with-sge (and recompiled your
> >> applications)? You are using the correct mpiexec?
> >>
> >> -- Reuti
> >>
> >>> Here is what I'm doing:
> >>> - launch a parallel computation involving 8 processors, using for each
> >>> of them 14GB of memory. I'm using a qsub command where i request
> >>> memory_free resource and use tight integration with openmpi
> >>> - 3 servers are available:
> >>> . barney with 4 cores (4 slots) and 32GB
> >>> . carl with 4 cores (4 slots) and 32GB
> >>> . charlie with 8 cores (8 slots) and 64GB
> >>>
> >>> Here is the output of the allocated nodes (OpenMPI output):
> >>> ====================== ALLOCATED NODES ======================
> >>>
> >>> Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2
> >>>
> >>> Daemon: [[44332,0],0] Daemon launched: True
> >>> Num slots: 4 Slots in use: 0
> >>> Num slots allocated: 4 Max slots: 0
> >>> Username on node: NULL
> >>> Num procs: 0 Next node_rank: 0
> >>>
> >>> Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2
> >>>
> >>> Daemon: Not defined Daemon launched: False
> >>> Num slots: 2 Slots in use: 0
> >>> Num slots allocated: 2 Max slots: 0
> >>> Username on node: NULL
> >>> Num procs: 0 Next node_rank: 0
> >>>
> >>> Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2
> >>>
> >>> Daemon: Not defined Daemon launched: False
> >>> Num slots: 2 Slots in use: 0
> >>> Num slots allocated: 2 Max slots: 0
> >>> Username on node: NULL
> >>> Num procs: 0 Next node_rank: 0
> >>>
> >>> =================================================================
> >>>
> >>> Here is what I see when my computation is running on the cluster:
> >>> # rank pid hostname
> >>>
> >>> 0 28112 charlie
> >>> 1 11417 carl
> >>> 2 11808 barney
> >>> 3 28113 charlie
> >>> 4 11418 carl
> >>> 5 11809 barney
> >>> 6 28114 charlie
> >>> 7 11419 carl
> >>>
> >>> Note that -the parallel environment used under SGE is defined as:
> >>> [eg_at_moe:~]$ qconf -sp round_robin
> >>> pe_name round_robin
> >>> slots 32
> >>> user_lists NONE
> >>> xuser_lists NONE
> >>> start_proc_args /bin/true
> >>> stop_proc_args /bin/true
> >>> allocation_rule $round_robin
> >>> control_slaves TRUE
> >>> job_is_first_task FALSE
> >>> urgency_slots min
> >>> accounting_summary FALSE
> >>>
> >>> I'm wondering why OpenMPI didn't use the allocated nodes chosen by SGE
> >>> (cf. "ALLOCATED NODES" report) but instead allocate each job of the
> >>> parallel computation at a time, using a round-robin method.
> >>>
> >>> Note that I'm using the '--bynode' option in the orterun command line.
> >>> If the behavior I'm observing is simply the consequence of using this
> >>> option, please let me know. This would eventually mean that one need
> >>> to state that SGE tight- integration has a lower priority on orterun
> >>> behavior than the different command line options.
> >>>
> >>> Any help would be appreciated,
> >>> Thanks,
> >>> Eloi
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Eloi Gaudry
Free Field Technologies
Axis Park Louvain-la-Neuve
Rue Emile Francqui, 1
B-1435 Mont-Saint Guibert
BELGIUM
Company Phone: +32 10 487 959
Company Fax:   +32 10 454 626