Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Pak Lui (Pak.Lui_at_[hidden])
Date: 2007-01-24 10:03:25

Geoff Galitz wrote:
> Hello,
> On the following system:
> OpenMPI 1.1.1
> SGE 6.0 (with tight integration)
> Scientific Linux 4.3
> Dual Dual-Core Opterons
> MPI jobs are oversubscribing to the nodes. No matter where jobs are
> launched by the scheduler, they always stack up on the first node
> (node00) and continue to stack even though the system load exceeds 6
> (on a 4 processor box). Eeach node is defined as 4 slots with 4 max
> slots. The MPI jobs launch via "mpirun -np (some-number-of-
> processors)" from within the scheduler.

Hi Geoff,

I think we first start having SGE support in 1.2, not in 1.1.1. Unless
you did some modification on your own to include the gridengine ras/pls
modules from v1.2, you probably are not using the SGE tight integration.
So even though you start mpirun in the SGE parallel environment, ORTE
does not have the gridengine modules for allocating and launching the
jobs, so that could be why all processes are launched on the same node.
(because there's no node list available from gridengine and it defaults
to single node)

On a related note, there is a way for SGE to allocate and assign slots
for launching tasks. It is done by setting the allocation rule in the
parallel environment (PE). If all of the slots are allocated on the same
node, it sounds like the allocation rule has been set to $fill_up. Maybe
you can try with $round_robin instead?

> It seems to me that MPI is not detecting that the nodes are
> overloaded and that due to the way the job slots are defined and how
> mpirun is being called. If I read the documentation correctly, a
> single mpirun run consumes one job slot no matter the number of
> processes which are launched. We can chagne the number of job slots,
> but then we expect to waste processors since only one mpirun job will
> run on any node, even if the job is only a two processor job.

As for oversubscription, I remember we start having that
-nooversubscribe option in v1.2 so if you want to limit ORTE from
oversubscribing because by default oversubscription is allowed.

> Can someone enlighten me?
> -geoff
> _______________________________________________
> users mailing list
> users_at_[hidden]