Suppose we run a parallel MPI code with 64 processes on a cluster, say of 16
nodes. The cluster nodes has multicore CPU say 4 cores on each node.
Now all the 64 cores on the cluster running a process. Program is SPMD,
means all processes has the same workload.
Now if we had done auto-vectorization while compiling the code (for example
with Intel compilers); Will there be any benefit (efficiency/scalability
improvement) of having code with the auto-vectorization? Or we will get the
same performance as without Auto-vectorization in this example case?
MEANS THAT if we do not have free cpu cores in a PC or cluster (all cores
are running MPI processes), still the auto-vertorization is beneficial? Or
it is beneficial only if we have some free cpu cores locally?
How can we really get benefit in performance improvement with