Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] (no subject)
From: San B (forum.san_at_[hidden])
Date: 2013-10-15 08:02:43


Hi,

     As per your instruction, I did the profiling of the application with
mpiP. Following is the difference between the two runs:

Run 1: 16 mpi processes on single node

@--- MPI Time (seconds) ---------------------------------------------------
---------------------------------------------------------------------------
Task AppTime MPITime MPI%
   0 3.61e+03 661 18.32
   1 3.61e+03 627 17.37
   2 3.61e+03 700 19.39
   3 3.61e+03 665 18.41
   4 3.61e+03 702 19.45
   5 3.61e+03 703 19.48
   6 3.61e+03 740 20.50
   7 3.61e+03 763 21.14
...
...

Run 2: 16 mpi processes on two nodes - 8 mpi processes per node

@--- MPI Time (seconds) ---------------------------------------------------
---------------------------------------------------------------------------
Task AppTime MPITime MPI%
   0 1.27e+04 1.06e+04 84.14
   1 1.27e+04 1.07e+04 84.34
   2 1.27e+04 1.07e+04 84.20
   3 1.27e+04 1.07e+04 84.20
   4 1.27e+04 1.07e+04 84.22
   5 1.27e+04 1.07e+04 84.25
   6 1.27e+04 1.06e+04 84.02
   7 1.27e+04 1.07e+04 84.35
   8 1.27e+04 1.07e+04 84.29

          The time spent for MPI functions in run 1 is less than 20%, where
as it is more than 80% in the run 2. For more details, I've attached both
output files. Please go thru these files and suggest what optimization we
can do with OpenMPI or Intel MKL.

Thanks

On Mon, Oct 7, 2013 at 12:15 PM, San B <forum.san_at_[hidden]> wrote:

> Hi,
>
> I'm facing a performance issue with a scientific application(Fortran).
> The issue is, it runs faster on single node but runs very slow on multiple
> nodes. For example, a 16 core job on single node finishes in 1hr 2mins, but
> the same job on two nodes (i.e. 8 cores per node & remaining 8 cores kept
> free) takes 3hr 20mins. The code is compiled with ifort-13.1.1,
> openmpi-1.4.5 and intel MKL libraries - lapack, blas, scalapack, blacs &
> fftw. What could be the problem here with?
> Is it possible to do any tuning in OpenMPI? FY More info: The cluster has
> Intel Sandybridge processor (E5-2670), infiniband and Hyperthreading is
> Enabled. Jobs are submitted thru LSF scheduler.
>
> Does HyperThreading causing any problem here?
>
>
> Thanks
>