Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] how to get mpirun to scale from 16 to 64 cores
From: Yuping Sun (yupingpaulasun_at_[hidden])
Date: 2014-06-16 23:23:08


Hi Ralph:

Is the following correct command to you:

mpirun -np 32 --bysocket --bycore ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi
 --time_timestep_loop --animation_freq -1

I run above command, still do not improve. Would you give me a detailed command with options?
Thank you.

Best regards,

Yuping

--------------------------------------------
On Tue, 6/17/14, Ralph Castain <rhc_at_[hidden]> wrote:

 Subject: Re: [OMPI users] how to get mpirun to scale from 16 to 64 cores
 To: "Yuping Sun" <yupingpaulasun_at_[hidden]>, "Open MPI Users" <users_at_[hidden]>
 Date: Tuesday, June 17, 2014, 1:59 AM
 
 Well, for one, there
 is never any guarantee of linear scaling with the number of
 procs - that is very application dependent. You can actually
 see performance decrease with number of procs if the
 application doesn't know how to exploit them.
 One thing that stands out is your mapping and
 binding options. Mapping bysocket means that you are putting
 neighboring ranks (i.e., ranks that differ by 1) on
 different sockets, which usually means different NUMA
 regions. This make shared memory between those procs run
 poorly. IF the application does a lot of messaging between
 ranks that differ by 1, then you would see poor
 scaling.
 So one thing you could do is change --bysocket to
 --bycore. Then, if your application isn't threaded, you
 could --bind-to-core for better performance.
 
 On Jun 16, 2014, at 3:19 PM, Yuping Sun <yupingpaulasun_at_[hidden]>
 wrote:
 Dear All:
 I
 bought a 64 core workstation and installed NASA fun3d with
 open mpi 1.6.5. Then I started to test run fun3d using 16,
 32, 48 cores. However the performance of the fun3d run is
 bad. I got data below:
 the
 run command is (it is for 32 core as an example)
 mpiexec
 -np 32 --bysocket --bind-to-socket
 ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi
 --time_timestep_loop --animation_freq -1 >
 screen.dump_bs30
 
 CPUs
     times   
 iterations    time/it
 60   
 678s    30it       
 22.61s
 48   
 702s    30it       
 23.40s
 32   
 734s    30it       
 24.50s
 16   
 894s    30it       
 29.80s
 You
 can see using 60 cores, to run 30 iteration, FUN3D will
 complete in 678 seconds, roughly 22.61 second per
 iteration.
 Using
 16 cores, to run 30 iteration, FUN3D will complete in 894
 seconds, roughly 29.8 seconds per iteration.
 the
 data above shows FUN3D run using mpirun does not scale at
 all! I used to run fun3d with mpirun on a 8 core WS, and it
 scales well.The
 same job to run on a linux cluster scales well.
 Would
 you all give me some advice to improve the performance loss
 when I
  increase the use of more cores, or how to run mpirun with
 proper options to get a linear scaling when using 16 to 32
 to 48 cores?
 Thank
 you.
 Yuping
 
 
 
 
 
 
 
 
 
 
 
 
 _______________________________________________
 users mailing list
 users_at_[hidden]
 Subscription:
 http://www.open-mpi.org/mailman/listinfo.cgi/users
 Link to this post:
 http://www.open-mpi.org/community/lists/users/2014/06/24654.php