On 9/15/2011 5:51 AM, Ghislain Lartigue wrote:
> start_0 = MPI_Wtime()
> start_1 = MPI_Wtime()
> call foo()
> end_1 = MPI_Wtime()
> write(*,*) "timer1 = ",end1-start1
> start_2 = MPI_Wtime()
> call bar()
> end_2 = MPI_Wtime()
> write(*,*) "timer2 = ",end2-start2
> end_0 = MPI_Wtime()
> write(*,*) "timer0 = ",end0-start0
> When I run my code on a "small" number of processors, I find that timer0=timer1+timer2 with a very good precision (less than 1%).
> However, as I increase the number of processors, this is not true any more: I can have 10%, 20% or even more discrepancy!
> The more processor I use, the bigger errors are observed.
> Obviously, my code is much bigger than the simple example above, but the principle is exactly the same.
In the simple example, if timer0 is much bigger than timer1+timer2, we'd
be inclined to attribute extra time to the timer calls or the write
statements... in any case, to time spent between end_1 and start_2 or
between end_2 and end_0. Are you sure in the actual code there are no
substantial operations in those sections? Also, is it possible your
processes are not running during some of those times? Are you
oversubscribing? Also, instead of printing out endX-startX, how about
writing out endX and startX individually so you get all six timestamps
and can see in greater detail where the discrepancy is arising.