I'm facing a performance issue with a scientific application(Fortran). The
issue is, it runs faster on single node but runs very slow on multiple
nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but
the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept
free) takes 3hr 20mins. The code is compiled with ifort-13.1.1,
openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs &
fftw. What could be the problem here with?
Is it possible to do any tuning in OpenMPI? FY More info: The cluster has
Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is
Enabled. Jobs are submitted thru LSF scheduler.
Does HyperThreading causing any problem here?