At 15:59 08/05/2012, you wrote:
>Yep you are correct. I did the same and it worked. When I have more
>than 3 MPI tasks there is lot of overhead on GPU.
>But for CPU there is not overhead. All three machines have 4 quad
>core processors with 3.8 GB RAM.
>Just wondering why there is no degradation of performance on CPU ?
Your GPU is saturated. It has more work than it can handle so its
If your kernel code is the one you posted some days ago you can
divide the number of threads and multiply the work done in each one,
so you do the same work (maybe faster) without using/wasting all the
thread pool and sm bandwith.