On the following system:
SGE 6.0 (with tight integration)
Scientific Linux 4.3
Dual Dual-Core Opterons
MPI jobs are oversubscribing to the nodes. No matter where jobs are
launched by the scheduler, they always stack up on the first node
(node00) and continue to stack even though the system load exceeds 6
(on a 4 processor box). Eeach node is defined as 4 slots with 4 max
slots. The MPI jobs launch via "mpirun -np (some-number-of-
processors)" from within the scheduler.
It seems to me that MPI is not detecting that the nodes are
overloaded and that due to the way the job slots are defined and how
mpirun is being called. If I read the documentation correctly, a
single mpirun run consumes one job slot no matter the number of
processes which are launched. We can chagne the number of job slots,
but then we expect to waste processors since only one mpirun job will
run on any node, even if the job is only a two processor job.
Can someone enlighten me?