I am not an HPL expert, but this might help.
1. rankfile mapper is avaliale only from Open MPI 1.3 version, if you are
using Open MPI 1.2.8 try -mca mpi_paffinity_alone 1
2. if you are using Open MPI 1.3 you dont have to use mpi_leave_pinned 1 ,
since it's a default value
On Thu, Jul 2, 2009 at 4:47 PM, Swamy Kandadai <swamy_at_[hidden]> wrote:
> I am running on a 2.66 GHz Nehalem node. On this node, the turbo mode and
> hyperthreading are enabled.
> When I run LINPACK with Intel MPI, I get 82.68 GFlops without much trouble.
> When I ran with OpenMPI (I have OpenMPI 1.2.8 but my colleague was using
> 1.3.2). I was using the same MKL libraries both with OpenMPI and
> Intel MPI. But with OpenMPI, the best I got so far is 80.22 GFlops and I
> could never achieve close to what I am getting with Intel MPI.
> Here are muy options with OpenMPI:
> mpirun -n 8 --machinefile hf --mca rmaps_rank_file_path rankfile --mca
> coll_sm_info_num_procs 8 --mca btl self,sm -mca mpi_leave_pinned 1
> Here is my rankfile:
> at rankfile
> rank 0=i02n05 slot=0
> rank 1=i02n05 slot=1
> rank 2=i02n05 slot=2
> rank 3=i02n05 slot=3
> rank 4=i02n05 slot=4
> rank 5=i02n05 slot=5
> rank 6=i02n05 slot=6
> rank 7=i02n05 slot=7
> In this case the physical cores are 0-7 while the additional logical
> processors with hyperthreading are 8-15.
> With "top" command, I could see all the 8 tasks are running on 8 different
> physical cores. I did not see
> 2 MPI tasks running on the same physical core. Also, the program is not
> paging as the problem size
> fits in the meory.
> Do you have any ideas how I can improve the performance so that it matches
> with Intel MPI performance?
> Any suggestions will be greatly appreciated.
> Swamy Kandadai
> Dr. Swamy N. Kandadai
> IBM Senior Certified Executive IT Specialist
> STG WW Modular Systems Benchmark Center
> STG WW HPC and BI CoC Benchmark Center
> Phone:( 845) 433 -8429 (8-293) Fax:(845)432-9789
> users mailing list