On 5 Mar 2009, at 15:25, Jeff Squyres wrote:
> I don't remember who originally said it, but I've repeated the
> statement: any MPI program that relies on a barrier for correctness is
> an incorrect MPI application.
I'm not 100% sure this holds although it's a good rule of thumb, I've
certainly written programs which need barriers but that's using
one-sided comms so is slightly different.
> There's anecdotal evidence that throwing in a barrier every once in a
> while can help reduce unexpected messages (and other things), and
> therefore improve performance a bit. But that's very application
> dependent, and usually not frequent.
I've seen this a number off times, a number of algorithms work fairly
well as long as things are vaguely in sync but slow down drastically if
they are not, without barriers there is no way to recover from this
slowdown. Basically if one rank is slow for whatever reason other
ranks try to communicate with it and the unexpected messages cause it
to slow down further and you get a positive feedback loop.
I sometimes feel that Barriers have a bad reputation and maybe it is
because they can be used to hide sloppy coding and allow incorrect MPI
applications to run, I don't see that as a reason not to use them
however, just be sure you need one.
On 5 Mar 2009, at 15:52, Shanyuan Gao wrote:
> My current research is trying to rewrite some collective MPI
> operations to work with our system. Barrier is my first step, maybe I
> will have bcast and reduce in the future. I understand that some
> applications used too many unnecessary barriers. But here what I want
> is just an application to measure the performance improvement versus
> normal MPI_Barrier. And the improvement can only be measured if the
> barriers are executed many times. I have done some synthetic tests,
> all I need now are real applications.
I've done a lot of work on Barrier and on collectives in general, my
advice would be to implement a non-blocking barrier, barriers can be
slow and *always* delay the application for the duration of the
barrier, if you can write a non-blocking barrier and pipeline it with
your application steps then assuming the application is working well
the CPU cost of the barrier is almost zero (I got it down to .15uS) and
if the application isn't working well then the barrier will still bring
it back in step.
Another interesting challenge is to benchmark MPI_Barrier, it's not as
easy as you might think...