On Sep 9, 2010, at 1:46 AM, Ashley Pittman wrote:
> On 9 Sep 2010, at 08:31, Terry Frankcombe wrote:
>> On Thu, 2010-09-09 at 01:24 -0600, Ralph Castain wrote:
>>> As people have said, these time values are to be expected. All they
>>> reflect is the time difference spent in reduce waiting for the slowest
>>> process to catch up to everyone else. The barrier removes that factor
>>> by forcing all processes to start from the same place.
>>> No mystery here - just a reflection of the fact that your processes
>>> arrive at the MPI_Reduce calls at different times.
>> Yes, however, it seems Gabriele is saying the total execution time
>> *drops* by ~500 s when the barrier is put *in*. (Is that the right way
>> around, Gabriele?)
>> That's harder to explain as a sync issue.
> Not really, you need some way of keeping processes in sync or else the slow ones get slower and the fast ones stay fast. If you have an un-balanced algorithm then you can end up swamping certain ranks and when they get behind they get even slower and performance goes off a cliff edge.
> Adding sporadic barriers keeps everything in sync and running nicely, if things are performing well then the barrier only slows things down but if there is a problem it'll bring all process back together and destroy the positive feedback cycle. This is why you often only need a synchronisation point every so often,
Precisely. And that is why we added the "sync" collective which inserts a barrier every so often since we don't have async barriers at this time. Also helps clean up memory growth due to unanticipated messages arriving during unbalanced algorithms.
See ompi_info --param coll all to see how to enable it.
> I'm also a huge fan of asyncronous barriers as a full sync is a blunt and slow operation, using asyncronous barriers you can allow small differences in timing but prevent them from getting too large with very little overhead in the common case where processes are synced already. I'm thinking specifically of starting a sync-barrier on iteration N, waiting for it on N+25 and immediately starting another one, again waiting for it 25 steps later.
> Ashley Pittman, Bath, UK.
> Padb - A parallel job inspection tool for cluster computing
> users mailing list