(I apologize in advance for the simplistic/newbie question.)
I'm performing an ALLREDUCE operation on a multi-dimensional array. This operation is the biggest bottleneck in the code, and I'm wondering if there's a way to do it more efficiently than what I'm doing now. Here's a representative example of what's happening:
ir=1
do ikl=1,km
do ij=1,jm
do ii=1,im
albuf(ir)=array(ii,ij,ikl,nl,0,ng)
ir=ir+1
enddo
enddo
enddo
agbuf=0.0
call mpi_allreduce(albuf,agbuf,im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
ir=1
do ikl=1,km
do ij=1,jm
do ii=1,im
phim(ii,ij,ikl,nl,0,ng)=agbuf(ir)
ir=ir+1
enddo
enddo
enddo
Is there any way to just do this in one fell swoop, rather than buffering, transmitting, and unbuffering? This operation is looped over many times. Are there savings to be had here?
Thanks,
Greg