Jeff:

I am running on a 2.66 GHz Nehalem node. On this node, the turbo mode and hyperthreading are enabled.
When I run LINPACK with Intel MPI, I get 82.68 GFlops without much trouble.

When I ran with OpenMPI (I have OpenMPI 1.2.8 but my colleague was using 1.3.2). I was using the same MKL libraries both with OpenMPI and
Intel MPI. But with OpenMPI, the best I got so far is 80.22 GFlops and I could never achieve close to what I am getting with Intel MPI.
Here are muy options with OpenMPI:

mpirun -n 8 --machinefile hf --mca rmaps_rank_file_path rankfile --mca coll_sm_info_num_procs 8 --mca btl self,sm -mca mpi_leave_pinned 1 ./xhpl_ompi

Here is my rankfile:

at rankfile
rank 0=i02n05 slot=0
rank 1=i02n05 slot=1
rank 2=i02n05 slot=2
rank 3=i02n05 slot=3
rank 4=i02n05 slot=4
rank 5=i02n05 slot=5
rank 6=i02n05 slot=6
rank 7=i02n05 slot=7

In this case the physical cores are 0-7 while the additional logical processors with hyperthreading are 8-15.
With "top" command, I could see all the 8 tasks are running on 8 different physical cores. I did not see
2 MPI tasks running on the same physical core. Also, the program is not paging as the problem size
fits in the meory.

Do you have any ideas how I can improve the performance so that it matches with Intel MPI performance?
Any suggestions will be greatly appreciated.

Thanks
Swamy Kandadai


Dr. Swamy N. Kandadai
IBM Senior Certified Executive IT Specialist
STG WW Modular Systems Benchmark Center
STG WW HPC and BI CoC Benchmark Center
Phone:( 845) 433 -8429 (8-293) Fax:(845)432-9789
swamy@us.ibm.com
http://w3.ibm.com/sales/systems/benchmarks