Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] busy waiting and oversubscriptions
From: Ross Boylan (ross_at_[hidden])
Date: 2014-03-26 17:26:58

[Main part is at the bottom]
On Wed, 2014-03-26 at 19:28 +0100, Andreas Schäfer wrote:
> Ross-
> On 09:08 Wed 26 Mar , Ross Boylan wrote:
> > On Wed, 2014-03-26 at 10:27 +0000, Jeff Squyres (jsquyres) wrote:
> > > On Mar 26, 2014, at 1:31 AM, Andreas Schäfer <gentryx_at_[hidden]> wrote:
> > This seems to restate the premise of my question. Is it meant to lead
> > to the answer "A process in busy wait blocks other users of the CPU to
> > the same extent as any other process at 100%"?
> Yes.
Thanks for confirming.
> > > >> At any rate, my question is whether, if I have processes that spend most
> > > >> of their time waiting to receive a message, I can put more of them than
> > > >> I have physical cores without much slowdown?
> > > >
> > > > AFAICS there will always be a certain slowdown. Is there a reason why
> > > > you would want to oversubscribe your nodes?
> > >
> > > Agreed -- this is not a good idea. It suggests that you should make your existing code more efficient -- perhaps by overlapping communication and computation.
> > My motivation was to get more work done with a given number of CPUs, and
> > also to find out how much of burden I was imposing on other users.
> >
> > My application consists of processes that have different roles. Some of
> > the roles don't have much to do (they play important roles, but don't do
> > much computation). My hope was that I could add them on without
> > imposing much of a burden.
> If you have a complex workflow with varying computational loads, then
> you might want to take a look at runtime systems which allow you to
> express this directly through their API, e.g. HPX[1]. HPX has proven to
> run with high efficiency on a wide range of architectures, and with a
> multitude of different workloads.
Thanks for the pointer.
> > Second, we do not operate in a batch queuing environment
> Why not fix that?
I'm not the sysadmin, though I'm involved in the group that sets policy.
At one point we were using Sun's grid engine, but I don't think it's
installed now. I'm not sure why.

We have discussed putting in a batch queuing system and nobody was
really pushing for it. My impression was (and probably still is) that
it was more pain than gain. There is hassle not only for the sysadmin
to set it up (and, I suppose, monitor it), but for users. Personally I
run a lot of interactive parallel jobs (the interaction is on rank 0
only). I have the impression that won't work under a batch system,
though I could be wrong. I also had the impression we'd need to have an
estimate of how long the job would run when we submit, and we don't
always know.

But I've never really used such a system, and may not appreciate what it
would get us. The other reason we haven't bothered is that the load on
the cluster was relatively light and contention was low. That is less
and less true, which probably starts tipping the balance toward a
queuing system.

This is wandering off topic, but if you or anyone else could say more
about why you regard the absence of a queuing system as a problem that
should be fixed, I'd love to hear it.