Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] how to get mpirun to scale from 16 to 64 cores
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-06-17 00:03:40


No, that isn't correct. It should be:

> mpirun -np 32 --bycore --bind-to-core ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi
> --time_timestep_loop --animation_freq -1

Again, there is no guarantee this will improve performance - the options that affect performance for a given application are highly application-specific

On Jun 16, 2014, at 8:23 PM, Yuping Sun <yupingpaulasun_at_[hidden]> wrote:

> Hi Ralph:
>
> Is the following correct command to you:
>
> mpirun -np 32 --bysocket --bycore ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi
> --time_timestep_loop --animation_freq -1
>
> I run above command, still do not improve. Would you give me a detailed command with options?
> Thank you.
>
> Best regards,
>
> Yuping
>
>
> --------------------------------------------
> On Tue, 6/17/14, Ralph Castain <rhc_at_[hidden]> wrote:
>
> Subject: Re: [OMPI users] how to get mpirun to scale from 16 to 64 cores
> To: "Yuping Sun" <yupingpaulasun_at_[hidden]>, "Open MPI Users" <users_at_[hidden]>
> Date: Tuesday, June 17, 2014, 1:59 AM
>
> Well, for one, there
> is never any guarantee of linear scaling with the number of
> procs - that is very application dependent. You can actually
> see performance decrease with number of procs if the
> application doesn't know how to exploit them.
> One thing that stands out is your mapping and
> binding options. Mapping bysocket means that you are putting
> neighboring ranks (i.e., ranks that differ by 1) on
> different sockets, which usually means different NUMA
> regions. This make shared memory between those procs run
> poorly. IF the application does a lot of messaging between
> ranks that differ by 1, then you would see poor
> scaling.
> So one thing you could do is change --bysocket to
> --bycore. Then, if your application isn't threaded, you
> could --bind-to-core for better performance.
>
> On Jun 16, 2014, at 3:19 PM, Yuping Sun <yupingpaulasun_at_[hidden]>
> wrote:
> Dear All:
> I
> bought a 64 core workstation and installed NASA fun3d with
> open mpi 1.6.5. Then I started to test run fun3d using 16,
> 32, 48 cores. However the performance of the fun3d run is
> bad. I got data below:
> the
> run command is (it is for 32 core as an example)
> mpiexec
> -np 32 --bysocket --bind-to-socket
> ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi
> --time_timestep_loop --animation_freq -1 >
> screen.dump_bs30
>
> CPUs
> times
> iterations time/it
> 60
> 678s 30it
> 22.61s
> 48
> 702s 30it
> 23.40s
> 32
> 734s 30it
> 24.50s
> 16
> 894s 30it
> 29.80s
> You
> can see using 60 cores, to run 30 iteration, FUN3D will
> complete in 678 seconds, roughly 22.61 second per
> iteration.
> Using
> 16 cores, to run 30 iteration, FUN3D will complete in 894
> seconds, roughly 29.8 seconds per iteration.
> the
> data above shows FUN3D run using mpirun does not scale at
> all! I used to run fun3d with mpirun on a 8 core WS, and it
> scales well.The
> same job to run on a linux cluster scales well.
> Would
> you all give me some advice to improve the performance loss
> when I
> increase the use of more cores, or how to run mpirun with
> proper options to get a linear scaling when using 16 to 32
> to 48 cores?
> Thank
> you.
> Yuping
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> Subscription:
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/06/24654.php
>