Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Heads up on new feature to 1.3.4
From: Kenneth Lloyd (kenneth.lloyd_at_[hidden])
Date: 2009-08-17 11:19:47


In some of the experiments I've run and studied on exclusive binding to
specific cores, the performance metrics (which have yielded both excellent
gains as well as phases of reduced performance) have depended upon the
nature of the experiment being run (a task partitioning problem) and how the
experimental data was organized (a data partitioning problem).

This is especially true when one considers the context in which the
experiment was run - meaning what other experiments scheduled either
concurrently or serially, the priorities of those experiments and the
configuration of the cluster / MPI network at any given point in time.

The approach we used was Bayesian. In other words, performance prediction
was conditioned on patterns of structure and context from both forward in
inverse Bayesian cycles.

Ken Lloyd

> -----Original Message-----
> From: devel-bounces_at_[hidden]
> [mailto:devel-bounces_at_[hidden]] On Behalf Of Jeff Squyres
> Sent: Monday, August 17, 2009 7:01 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] Heads up on new feature to 1.3.4
>
> On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote:
>
> > I think the problem here, Eugene, is that performance
> benchmarks are
> > far from the typical application. We have repeatedly seen this -
> > optimizing for benchmarks frequently makes applications run less
> > efficiently. So I concur with Chris on this one - let's not
> go -too-
> > benchmark happy and hurt the regular users.
>
> FWIW, I've seen processor binding help real user codes, too.
> Indeed, on a system where an MPI job has exclusive use of the
> node, how does binding hurt you?
>
> On nodes where multiple MPI jobs are running, if a resource
> manager is being used, then we should be obeying what they
> have specified for each job to use. We need a bit more work
> in that direction to make that work, but it's very do-able.
>
> When resource managers are not used and multiple MPI jobs
> share the same node, then OMPI has to coordinate amongst its
> jobs to not oversubscribe cores (when possible). As Ralph
> indicated in a later mail, we still need some work in this area, too.
>
> > Here at LANL, binding to-socket instead of to-core hurts
> performance
> > by ~5-10%, depending on the specific application. Of course, either
> > binding method is superior to no binding at all...
>
> This is probably not too surprising (i.e., allowing the OS to
> move jobs around between cores on a socket can probably
> involve a little cache thrashing, resulting in that 5-10%
> loss). I'm hand-waving here, and I have not tried this
> myself, but it's not too surprising of a result to me. It's
> also not too surprising that others don't see this effect at
> all (e.g., Sun didn't see any difference in bind-to-core vs.
> bind-to-socket) when they ran their tests. YMMV.
>
> I'd actually be in favor of a by-core binding (not
> by-socket), but spreading the processes out round robin by
> socket, not by core. All of this would be the *default*
> behavior, of course -- command line params/MCA params will be
> provided to change to whatever pattern is desired.
>
> > UNLESS you have a threaded application, in which case -any- binding
> > can be highly detrimental to performance.
>
> I'm not quite sure I understand this statement. Binding is
> not inherently contrary to multi-threaded applications.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel