Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI
From: Torgny Faxen (faxen_at_[hidden])
Date: 2009-08-05 11:32:00


Ralph,
I can't get "opal_paffinity_alone" to work (see below). However, there
is a "mpi_affinity_alone" that I tried without any improvement.

However, setting:
 -mca btl_openib_eager_limit 65536
gave a 15% improvement so OpenMPI is now down to 326 (from previous 376
seconds). Still a lot more than ScaliMPI with 214 seconds.

Looking at the profile data my gut feeling is that the performance
suffers due to the frequent calls to MPI_IPROBE. I will look at this and
count the number of calls but it could easily be 10 times more calls to
MPI_IPROBE thanto MPI_BSEND.

/Torgny

n70 462% ompi_info --param all all | grep opal
                MCA opal: parameter "opal_signal" (current value:
"6,7,8,11", data source: default value)
                MCA opal: parameter "opal_set_max_sys_limits" (current
value: "0", data source: default value)
                MCA opal: parameter "opal_event_include" (current value:
"poll", data source: default value)
n70 463% ompi_info --param all all | grep paffinity
                 MCA mpi: parameter "mpi_paffinity_alone" (current
value: "0", data source: default value)
           MCA paffinity: parameter "paffinity_base_verbose" (current
value: "0", data source: default value)
                          Verbosity level of the paffinity framework
           MCA paffinity: parameter "paffinity" (current value: <none>,
data source: default value)
                          Default selection set of components for the
paffinity framework (<none> means use all components that can be found)
           MCA paffinity: parameter "paffinity_linux_priority" (current
value: "10", data source: default value)
                          Priority of the linux paffinity component
           MCA paffinity: information "paffinity_linux_plpa_version"
(value: "1.2rc2", data source: default value)

Ralph Castain wrote:
> Okay, one problem is fairly clear. As Terry indicated, you have to
> tell us to bind or else you lose a lot of performace. Set -mca
> opal_paffinity_alone 1 on your cmd line and it should make a
> significant difference.
>
>
> On Wed, Aug 5, 2009 at 8:10 AM, Torgny Faxen <faxen_at_[hidden]
> <mailto:faxen_at_[hidden]>> wrote:
>
> Ralph,
> I am running through a locally provided wrapper but it translates to:
> /software/mpi/openmpi/1.3b2/i101017/bin/mpirun -np 144 -npernode 8
> -mca mpi_show_mca_params env,file /nobac
> kup/rossby11/faxen/RCO_scobi/src_161.openmpi/rco2.24pe
>
> a) Upgrade.. This will take some time, it will have to go through
> the administrator, this is a production cluster
> b) -mca .. see output below
> c) I used exactly the same optimization flags for all three
> versions (ScaliMPI, OpenMPI and IntelMPI) and this is Fortran so I
> am using mpif90 :-)
>
> Regards / Torgny
>
> [n70:30299] ess=env (environment)
> [n70:30299] orte_ess_jobid=482607105 (environment)
> [n70:30299] orte_ess_vpid=0 (environment)
> [n70:30299] mpi_yield_when_idle=0 (environment)
> [n70:30299] mpi_show_mca_params=env,file (environment)
>
>
> Ralph Castain wrote:
>
> Could you send us the mpirun cmd line? I wonder if you are
> missing some options that could help. Also, you might:
>
> (a) upgrade to 1.3.3 - it looks like you are using some kind
> of pre-release version
>
> (b) add -mca mpi_show_mca_params env,file - this will cause
> rank=0 to output what mca params it sees, and where they came from
>
> (c) check that you built a non-debug version, and remembered
> to compile your application with a -O3 flag - i.e., "mpicc -O3
> ...". Remember, OMPI does not automatically add optimization
> flags to mpicc!
>
> Thanks
> Ralph
>
>
> On Wed, Aug 5, 2009 at 7:15 AM, Torgny Faxen <faxen_at_[hidden]
> <mailto:faxen_at_[hidden]> <mailto:faxen_at_[hidden]
> <mailto:faxen_at_[hidden]>>> wrote:
>
> Pasha,
> no collectives are being used.
>
> A simple grep in the code reveals the following MPI functions
> being used:
> MPI_Init
> MPI_wtime
> MPI_COMM_RANK
> MPI_COMM_SIZE
> MPI_BUFFER_ATTACH
> MPI_BSEND
> MPI_PACK
> MPI_UNPACK
> MPI_PROBE
> MPI_GET_COUNT
> MPI_RECV
> MPI_IPROBE
> MPI_FINALIZE
>
> where MPI_IPROBE is the clear winner in terms of number of
> calls.
>
> /Torgny
>
>
> Pavel Shamis (Pasha) wrote:
>
> Do you know if the application use some collective
> operations ?
>
> Thanks
>
> Pasha
>
> Torgny Faxen wrote:
>
> Hello,
> we are seeing a large difference in performance for
> some
> applications depending on what MPI is being used.
>
> Attached are performance numbers and oprofile output
> (first 30 lines) from one out of 14 nodes from one
> application run using OpenMPI, IntelMPI and Scali MPI
> respectively.
>
> Scali MPI is faster the other two MPI:s with a
> factor of
> 1.6 and 1.75:
>
> ScaliMPI: walltime for the whole application is 214
> seconds
> OpenMPI: walltime for the whole application is 376
> seconds
> Intel MPI: walltime for the whole application is
> 346 seconds.
>
> The application is running with the main send receive
> commands being:
> MPI_Bsend
> MPI_Iprobe followed by MPI_Recv (in case of there
> being a
> message). Quite often MPI_Iprobe is being called
> just to
> check whether there is a certain message pending.
>
> Any idea on tuning tips, performance analysis, code
> modifications to improve the OpenMPI performance? A
> lot of
> time is being spent in "mca_btl_sm_component_progress",
> "btl_openib_component_progress" and other internal
> routines.
>
> The code is running on a cluster with 140 HP ProLiant
> DL160 G5 compute servers. Infiniband interconnect.
> Intel
> Xeon E5462 processors. The profiled application is
> using
> 144 cores on 18 nodes over Infiniband.
>
> Regards / Torgny
>
> =====================================================================================================================0
>
> OpenMPI 1.3b2
>
> =====================================================================================================================0
>
>
> Walltime: 376 seconds
>
> CPU: CPU with timer interrupt, speed 0 MHz (estimated)
> Profiling through timer interrupt
> samples % image name app name
> symbol name
> 668288 22.2113 mca_btl_sm.so
> rco2.24pe mca_btl_sm_component_progress
> 441828 14.6846 rco2.24pe
> rco2.24pe step_
> 335929 11.1650 libmlx4-rdmav2.so
> rco2.24pe (no symbols)
> 301446 10.0189 mca_btl_openib.so
> rco2.24pe btl_openib_component_progress
> 161033 5.3521 libopen-pal.so.0.0.0
> rco2.24pe opal_progress
> 157024 5.2189 libpthread-2.5.so
> <http://libpthread-2.5.so>
> <http://libpthread-2.5.so> rco2.24pe
> pthread_spin_lock
>
> 99526 3.3079 no-vmlinux
> no-vmlinux (no symbols)
> 93887 3.1204 mca_btl_sm.so
> rco2.24pe opal_using_threads
> 69979 2.3258 mca_pml_ob1.so
> rco2.24pe mca_pml_ob1_iprobe
> 58895 1.9574 mca_bml_r2.so
> rco2.24pe mca_bml_r2_progress
> 55095 1.8311 mca_pml_ob1.so
> rco2.24pe
> mca_pml_ob1_recv_request_match_wild
> 49286 1.6381 rco2.24pe
> rco2.24pe tracer_
> 41946 1.3941 libintlc.so.5
> rco2.24pe __intel_new_memcpy
> 40730 1.3537 rco2.24pe
> rco2.24pe scobi_
> 36586 1.2160 rco2.24pe
> rco2.24pe state_
> 20986 0.6975 rco2.24pe
> rco2.24pe diag_
> 19321 0.6422 libmpi.so.0.0.0
> rco2.24pe PMPI_Unpack
> 18552 0.6166 libmpi.so.0.0.0
> rco2.24pe PMPI_Iprobe
> 17323 0.5757 rco2.24pe
> rco2.24pe clinic_
> 16194 0.5382 rco2.24pe
> rco2.24pe k_epsi_
> 15330 0.5095 libmpi.so.0.0.0
> rco2.24pe PMPI_Comm_f2c
> 13778 0.4579 libmpi_f77.so.0.0.0
> rco2.24pe mpi_iprobe_f
> 13241 0.4401 rco2.24pe
> rco2.24pe s_recv_
> 12386 0.4117 rco2.24pe
> rco2.24pe growth_
> 11699 0.3888 rco2.24pe
> rco2.24pe testnrecv_
> 11268 0.3745 libmpi.so.0.0.0
> rco2.24pe
> mca_pml_base_recv_request_construct
> 10971 0.3646 libmpi.so.0.0.0
> rco2.24pe ompi_convertor_unpack
> 10034 0.3335 mca_pml_ob1.so
> rco2.24pe
> mca_pml_ob1_recv_request_match_specific
> 10003 0.3325 libimf.so
> rco2.24pe exp.L
> 9375 0.3116 rco2.24pe
> rco2.24pe subbasin_
> 8912 0.2962 libmpi_f77.so.0.0.0
> rco2.24pe mpi_unpack_f
>
>
>
>
> =====================================================================================================================0
>
> Intel MPI, version 3.2.0.011/ <http://3.2.0.011/>
> <http://3.2.0.011/>
>
>
> =====================================================================================================================0
>
>
> Walltime: 346 seconds
>
> CPU: CPU with timer interrupt, speed 0 MHz (estimated)
> Profiling through timer interrupt
> samples % image name app name
> symbol name
> 486712 17.7537 rco2 rco2
> step_
> 431941 15.7558 no-vmlinux
> no-vmlinux (no symbols)
> 212425 7.7486 libmpi.so.3.2 rco2
> MPIDI_CH3U_Recvq_FU
> 188975 6.8932 libmpi.so.3.2 rco2
> MPIDI_CH3I_RDSSM_Progress
> 172855 6.3052 libmpi.so.3.2 rco2
> MPIDI_CH3I_read_progress
> 121472 4.4309 libmpi.so.3.2 rco2
> MPIDI_CH3I_SHM_read_progress
> 64492 2.3525 libc-2.5.so <http://libc-2.5.so>
> <http://libc-2.5.so> rco2
> sched_yield
>
> 52372 1.9104 rco2 rco2
> tracer_
> 48621 1.7735 libmpi.so.3.2 rco2
> .plt
> 45475 1.6588 libmpiif.so.3.2 rco2
> pmpi_iprobe__
> 44082 1.6080 libmpi.so.3.2 rco2
> MPID_Iprobe
> 42788 1.5608 libmpi.so.3.2 rco2
> MPIDI_CH3_Stop_recv
> 42754 1.5595 libpthread-2.5.so
> <http://libpthread-2.5.so>
> <http://libpthread-2.5.so> rco2
> pthread_mutex_lock
>
> 42190 1.5390 libmpi.so.3.2 rco2
> PMPI_Iprobe
> 41577 1.5166 rco2 rco2
> scobi_
> 40356 1.4721 libmpi.so.3.2 rco2
> MPIDI_CH3_Start_recv
> 38582 1.4073 libdaplcma.so.1.0.2 rco2
> (no symbols)
> 37545 1.3695 rco2 rco2
> state_
> 35597 1.2985 libc-2.5.so <http://libc-2.5.so>
> <http://libc-2.5.so> rco2
> free
> 34019 1.2409 libc-2.5.so <http://libc-2.5.so>
> <http://libc-2.5.so> rco2
> malloc
>
> 31841 1.1615 rco2 rco2
> s_recv_
> 30955 1.1291 libmpi.so.3.2 rco2
> __I_MPI___intel_new_memcpy
> 27876 1.0168 libc-2.5.so <http://libc-2.5.so>
> <http://libc-2.5.so> rco2
> _int_malloc
>
> 26963 0.9835 rco2 rco2
> testnrecv_
> 23005 0.8391 libpthread-2.5.so
> <http://libpthread-2.5.so>
> <http://libpthread-2.5.so> rco2
> __pthread_mutex_unlock_usercnt
>
> 22290 0.8131 libmpi.so.3.2 rco2
> MPID_Segment_manipulate
> 22086 0.8056 libmpi.so.3.2 rco2
> MPIDI_CH3I_read_progress_expected
> 19146 0.6984 rco2 rco2
> diag_
> 18250 0.6657 rco2 rco2
> clinic_
>
> =====================================================================================================================0
>
> Scali MPI, version 3.13.10-59413
>
> =====================================================================================================================0
>
>
> Walltime:
>
> CPU: CPU with timer interrupt, speed 0 MHz (estimated)
> Profiling through timer interrupt
> samples % image name app name
> symbol name
> 484267 30.0664 rco2.24pe
> rco2.24pe step_
> 111949 6.9505 libmlx4-rdmav2.so
> rco2.24pe (no symbols)
> 73930 4.5900 libmpi.so
> rco2.24pe scafun_rq_handle_body
> 57846 3.5914 libmpi.so
> rco2.24pe invert_decode_header
> 55836 3.4667 libpthread-2.5.so
> <http://libpthread-2.5.so>
> <http://libpthread-2.5.so> rco2.24pe
> pthread_spin_lock
>
> 53703 3.3342 rco2.24pe
> rco2.24pe tracer_
> 40934 2.5414 rco2.24pe
> rco2.24pe scobi_
> 40244 2.4986 libmpi.so
> rco2.24pe scafun_request_probe_handler
> 37399 2.3220 rco2.24pe
> rco2.24pe state_
> 30455 1.8908 libmpi.so
> rco2.24pe invert_matchandprobe
> 29707 1.8444 no-vmlinux
> no-vmlinux (no symbols)
> 29147 1.8096 libmpi.so
> rco2.24pe FMPI_scafun_Iprobe
> 27969 1.7365 libmpi.so
> rco2.24pe decode_8_u_64
> 27475 1.7058 libmpi.so
> rco2.24pe scafun_rq_anysrc_fair_one
> 25966 1.6121 libmpi.so
> rco2.24pe scafun_uxq_probe
> 24380 1.5137 libc-2.5.so <http://libc-2.5.so>
> <http://libc-2.5.so> rco2.24pe
> memcpy
>
> 22615 1.4041 libmpi.so
> rco2.24pe .plt
> 21172 1.3145 rco2.24pe
> rco2.24pe diag_
> 20716 1.2862 libc-2.5.so <http://libc-2.5.so>
> <http://libc-2.5.so> rco2.24pe
> memset
>
> 18565 1.1526 libmpi.so
> rco2.24pe openib_wrapper_poll_cq
> 18192 1.1295 rco2.24pe
> rco2.24pe clinic_
> 17135 1.0638 libmpi.so
> rco2.24pe PMPI_Iprobe
> 16685 1.0359 rco2.24pe
> rco2.24pe k_epsi_
> 16236 1.0080 libmpi.so
> rco2.24pe PMPI_Unpack
> 15563 0.9662 libmpi.so
> rco2.24pe scafun_r_rq_append
> 14829 0.9207 libmpi.so
> rco2.24pe scafun_rq_test_finished
> 13349 0.8288 rco2.24pe
> rco2.24pe s_recv_
> 12490 0.7755 libmpi.so
> rco2.24pe flop_matchandprobe
> 12427 0.7715 libibverbs.so.1.0.0
> rco2.24pe (no symbols)
> 12272 0.7619 libmpi.so
> rco2.24pe scafun_rq_handle
> 12146 0.7541 rco2.24pe
> rco2.24pe growth_
> 10175 0.6317 libmpi.so
> rco2.24pe wrp2p_test_finished
> 9888 0.6139 libimf.so
> rco2.24pe exp.L
> 9179 0.5699 rco2.24pe
> rco2.24pe subbasin_
> 9082 0.5639 rco2.24pe
> rco2.24pe testnrecv_
> 8901 0.5526 libmpi.so
> rco2.24pe openib_wrapper_purge_requests
> 7425 0.4610 rco2.24pe
> rco2.24pe scobimain_
> 7378 0.4581 rco2.24pe
> rco2.24pe scobi_interface_
> 6530 0.4054 rco2.24pe
> rco2.24pe setvbc_
> 6471 0.4018 libfmpi.so
> rco2.24pe pmpi_iprobe
> 6341 0.3937 rco2.24pe
> rco2.24pe snap_
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> -- ---------------------------------------------------------
> Torgny Faxén National Supercomputer Center
> Linköping University S-581 83 Linköping
> Sweden
> Email:faxen_at_[hidden] <mailto:Email%3Afaxen_at_[hidden]>
> <mailto:Email%3Afaxen_at_[hidden]
> <mailto:Email%253Afaxen_at_[hidden]>>
>
> Telephone: +46 13 285798 (office) +46 13 282535 (fax)
> http://www.nsc.liu.se
> ---------------------------------------------------------
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ------------------------------------------------------------------------
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> ---------------------------------------------------------
> Torgny Faxén
> National Supercomputer Center
> Linköping University
> S-581 83 Linköping
> Sweden
>
> Email:faxen_at_[hidden] <mailto:Email%3Afaxen_at_[hidden]>
> Telephone: +46 13 285798 (office) +46 13 282535 (fax)
> http://www.nsc.liu.se
> ---------------------------------------------------------
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
---------------------------------------------------------
   Torgny Faxén		
   National Supercomputer Center
   Linköping University	
   S-581 83 Linköping
   Sweden	
   Email:faxen_at_[hidden]
   Telephone: +46 13 285798 (office) +46 13 282535  (fax)
   http://www.nsc.liu.se
---------------------------------------------------------