On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote:
> I think the problem here, Eugene, is that performance benchmarks are
> far from the typical application. We have repeatedly seen this -
> optimizing for benchmarks frequently makes applications run less
> efficiently. So I concur with Chris on this one - let's not go -too-
> benchmark happy and hurt the regular users.
FWIW, I've seen processor binding help real user codes, too. Indeed,
on a system where an MPI job has exclusive use of the node, how does
binding hurt you?
On nodes where multiple MPI jobs are running, if a resource manager is
being used, then we should be obeying what they have specified for
each job to use. We need a bit more work in that direction to make
that work, but it's very do-able.
When resource managers are not used and multiple MPI jobs share the
same node, then OMPI has to coordinate amongst its jobs to not
oversubscribe cores (when possible). As Ralph indicated in a later
mail, we still need some work in this area, too.
> Here at LANL, binding to-socket instead of to-core hurts performance
> by ~5-10%, depending on the specific application. Of course, either
> binding method is superior to no binding at all...
This is probably not too surprising (i.e., allowing the OS to move
jobs around between cores on a socket can probably involve a little
cache thrashing, resulting in that 5-10% loss). I'm hand-waving here,
and I have not tried this myself, but it's not too surprising of a
result to me. It's also not too surprising that others don't see this
effect at all (e.g., Sun didn't see any difference in bind-to-core vs.
bind-to-socket) when they ran their tests. YMMV.
I'd actually be in favor of a by-core binding (not by-socket), but
spreading the processes out round robin by socket, not by core. All
of this would be the *default* behavior, of course -- command line
params/MCA params will be provided to change to whatever pattern is
> UNLESS you have a threaded application, in which case -any- binding
> can be highly detrimental to performance.
I'm not quite sure I understand this statement. Binding is not
inherently contrary to multi-threaded applications.