Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] busy waiting and oversubscriptions
From: Andreas Schäfer (gentryx_at_[hidden])
Date: 2014-03-27 05:05:38


Heya,

On 19:21 Wed 26 Mar , Gus Correa wrote:
> On 03/26/2014 05:26 PM, Ross Boylan wrote:
> > [Main part is at the bottom]
> > On Wed, 2014-03-26 at 19:28 +0100, Andreas Schäfer wrote:
> >> On 09:08 Wed 26 Mar , Ross Boylan wrote:
> >>> Second, we do not operate in a batch queuing environment
> >> Why not fix that?
> > I'm not the sysadmin, though I'm involved in the group that sets policy.
> > At one point we were using Sun's grid engine, but I don't think it's
> > installed now. I'm not sure why.
> >
> > We have discussed putting in a batch queuing system and nobody was
> > really pushing for it. My impression was (and probably still is) that
> > it was more pain than gain. There is hassle not only for the sysadmin
> > to set it up (and, I suppose, monitor it), but for users. Personally I
> > run a lot of interactive parallel jobs (the interaction is on rank 0
> > only). I have the impression that won't work under a batch system,
> > though I could be wrong. I also had the impression we'd need to have an
> > estimate of how long the job would run when we submit, and we don't
> > always know.
>
> But I've never really used such a system, and may not appreciate what it
> would get us. The other reason we haven't bothered is that the load on
> the cluster was relatively light and contention was low. That is less
> and less true, which probably starts tipping the balance toward a
> queuing system.
>
> This is wandering off topic, but if you or anyone else could say more
> about why you regard the absence of a queuing system as a problem that
> should be fixed, I'd love to hear it.
>
> Ross
>
> Hi Ross
>
> Some pros:
> (I don't know of any cons.)

I second Gus' statement that there are no real downsides for a
queueing system. These systems actually relieves both, users and
admins from a lot of tedious fiddling and debugging. If you're doing a
fresh install, then I'd suggest you to use Slurm[1]. It's a breeze to
install and easy to maintain. It also integrates well with all major
MPI implementations. Yes, the admin and users need to invest to time
to learn the ropes, but they payoff is almost instant. Source: I'm the
sysadmin for our research clusters.

> Queue systems won't allow resources to be oversubscribed.

I'm fairly confident that you can configure Slurm to oversubscribe
nodes: just specify more cores for a node than are actually present.

> Queue systems do support interactive jobs (even with X-windows GUIs, if
> needed).

Right, actually we've just moved a couple of systems, which are
primarily running interactive jobs, to Slurm to ease arbitration of
resources. Previously users were frequently stepping on each others
toes (Who's pinning jobs to which core? Who's using which GPU? How
much RAM do you consume?) These problems are gone now.

Cheers
-Andreas

[1] https://computing.llnl.gov/linux/slurm/

-- 
==========================================================
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==========================================================
(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!