Gus Correa <gus_at_[hidden]> writes:
> On 03/27/2014 05:05 AM, Andreas Schäfer wrote:
>>> >Queue systems won't allow resources to be oversubscribed.
[Maybe that meant that resource managers can, and typically do, prevent
resources being oversubscribed.]
>> I'm fairly confident that you can configure Slurm to oversubscribe
>> nodes: just specify more cores for a node than are actually present.
> That is true.
> If you lie to the queue system about your resources,
> it will believe you and oversubscribe.
For what it's worth, oversubscription might be overall or limited. We
just had a user running some crazy Java program he refuses to explain
submitted as a serial job running ~150 threads. The over-subscription
was confined to core is used, and the effect on the 127 others was
mostly due to the small overhead of the node daemon reading the crazy
/proc smaps file to track the memory usage. The other cores were
Ob-OMPI: the other jobs may have been OMPI ones!
> Torque has this same feature.
> I don't know about SGE.
> You may choose to set some or all nodes with more cores than they
> actually have, if that is a good choice for the codes you run.
> However, for our applications oversubscribing is bad, hence my mindset.
Right. I don't think there's any question that it's a bad idea on a
general purpose cluster running some OMPI jobs, for instance.