Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] busy waiting and oversubscriptions
From: Andreas Schäfer (gentryx_at_[hidden])
Date: 2014-03-26 14:28:58


Ross-

On 09:08 Wed 26 Mar , Ross Boylan wrote:
> On Wed, 2014-03-26 at 10:27 +0000, Jeff Squyres (jsquyres) wrote:
> > On Mar 26, 2014, at 1:31 AM, Andreas Schäfer <gentryx_at_[hidden]> wrote:
> >
> > >> Even when "idle", MPI processes use all the CPU. I thought I remember
> > >> someone saying that they will be low priority, and so not pose much of
> > >> an obstacle to other uses of the CPU.
> > >
> > > well, if they're blocking in an MPI call, then they'll be doing a busy
> > > wait, so each thread will easily churn up 100% CPU load.
> >
> > +1
> This seems to restate the premise of my question. Is it meant to lead
> to the answer "A process in busy wait blocks other users of the CPU to
> the same extent as any other process at 100%"?

Yes.

> > >> At any rate, my question is whether, if I have processes that spend most
> > >> of their time waiting to receive a message, I can put more of them than
> > >> I have physical cores without much slowdown?
> > >
> > > AFAICS there will always be a certain slowdown. Is there a reason why
> > > you would want to oversubscribe your nodes?
> >
> > Agreed -- this is not a good idea. It suggests that you should make your existing code more efficient -- perhaps by overlapping communication and computation.
> My motivation was to get more work done with a given number of CPUs, and
> also to find out how much of burden I was imposing on other users.
>
> My application consists of processes that have different roles. Some of
> the roles don't have much to do (they play important roles, but don't do
> much computation). My hope was that I could add them on without
> imposing much of a burden.

If you have a complex workflow with varying computational loads, then
you might want to take a look at runtime systems which allow you to
express this directly through their API, e.g. HPX[1]. HPX has proven to
run with high efficiency on a wide range of architectures, and with a
multitude of different workloads.

> Second, we do not operate in a batch queuing environment

Why not fix that?

> Finally, overlapping communication and computation is a bit tricky. The
> recent thread I started about Isend indicates that communication
> requires the involvement of both the sender and receiver processes and
> if one of them is busy with computation it can really slow things down.
> I seem to have gotten good results by using Isend generally, in
> particular when sending messages to the heavy computing processes, and
> Send when sending from those same processes.

Yeah, this is hard to achieve in MPI. This is because MPI is meant as
a rather low-level, but highly efficient and portable message passing
interface. If you with to express task based parallelism and their
dependencies, then high-level runtimes like HPX are the way to go. HPX
can also hide network latencies by oversubscribing a node and
scheduling other threads while the current one is waiting for
communication.

[1] http://stellar.cct.lsu.edu/downloads/
    http://stellar.cct.lsu.edu/docs/

Cheers
-Andreas

-- 
==========================================================
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==========================================================
(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!