Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Heads up on new feature to 1.3.4
From: N.M. Maclaren (nmm1_at_[hidden])
Date: 2009-08-17 10:38:49


On Aug 17 2009, Jeff Squyres wrote:
>On Aug 16, 2009, at 11:02 PM, Ralph Castain wrote:
>
>> I think the problem here, Eugene, is that performance benchmarks are
>> far from the typical application. We have repeatedly seen this -
>> optimizing for benchmarks frequently makes applications run less
>> efficiently. So I concur with Chris on this one - let's not go -too-
>> benchmark happy and hurt the regular users.
>
>FWIW, I've seen processor binding help real user codes, too. Indeed,
>on a system where an MPI job has exclusive use of the node, how does
>binding hurt you?

Here is how, and I can assure you that's it's not nice, not at all; it can
kill an application dead. I have some experience with running large SMP
systems (Origin, SunFire F15K and POWER3/4 racks) and this area was a
nightmare.

Process A is bound, and is waiting briefly for a receive. All of the
other cores are busy with the processors bound to them. There is then some
action from another process, a daemon or a kernel thread that needs service
from the kernel. So it starts a thread on process A's core. Unfortunately,
this is a long-running thread (e.g. NFS) so, when the other processors
finish, and A is the bottleneck, the whole job hangs until that kernel
thread finishes.

You can get a similar effect if process A is bound to a CPU which has an
I/O device bound to it. When something else entirely starts hammering that
device, even if it doesn't tie it up for long each time, bye-bye
performance. This is typically a problem on multi-socket systems, of
course, but could show up even on quite small ones.

For this reason, many schedulers ignore binding hints when they 'think' they
know better - and, no matter what the documentation says, hints is generally
all they are. You can then get processes rotating round the processors,
exercising the inter-cache buses nicely .... In my experience, binding can
sometimes make that more likely rather than less, and the best solutions are
usually different.

Yes, I used binding, but it was hell to set up, and many people give up,
saying that it degrades performance. I advise ordinary users to avoid it
like the plague, and use more reliable tuning techniques.

>> UNLESS you have a threaded application, in which case -any- binding
>> can be highly detrimental to performance.
>
>I'm not quite sure I understand this statement. Binding is not
>inherently contrary to multi-threaded applications.

That is true. But see above.

Another circumstance where that is true is when your application is a MPI
one, but which calls SMP-enabled libraries; this is getting increasingly
common. Binding can stop those using spare cores or otherwise confuse
them; God help you if they start to use a 4-core algorithm on one core!

Regards,
Nick Maclaren.