I am running on a 2.66 GHz Nehalem node. On this node, the turbo mode and
hyperthreading are enabled.
When I run LINPACK with Intel MPI, I get 82.68 GFlops without much
When I ran with OpenMPI (I have OpenMPI 1.2.8 but my colleague was using
1.3.2). I was using the same MKL libraries both with OpenMPI and
Intel MPI. But with OpenMPI, the best I got so far is 80.22 GFlops and I
could never achieve close to what I am getting with Intel MPI.
Here are muy options with OpenMPI:
mpirun -n 8 --machinefile hf --mca rmaps_rank_file_path rankfile --mca
coll_sm_info_num_procs 8 --mca btl self,sm -mca mpi_leave_pinned
Here is my rankfile:
rank 0=i02n05 slot=0
rank 1=i02n05 slot=1
rank 2=i02n05 slot=2
rank 3=i02n05 slot=3
rank 4=i02n05 slot=4
rank 5=i02n05 slot=5
rank 6=i02n05 slot=6
rank 7=i02n05 slot=7
In this case the physical cores are 0-7 while the additional logical
processors with hyperthreading are 8-15.
With "top" command, I could see all the 8 tasks are running on 8 different
physical cores. I did not see
2 MPI tasks running on the same physical core. Also, the program is not
paging as the problem size
fits in the meory.
Do you have any ideas how I can improve the performance so that it matches
with Intel MPI performance?
Any suggestions will be greatly appreciated.
Dr. Swamy N. Kandadai
IBM Senior Certified Executive IT Specialist
STG WW Modular Systems Benchmark Center
STG WW HPC and BI CoC Benchmark Center
Phone:( 845) 433 -8429 (8-293) Fax:(845)432-9789