This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
Is the following correct command to you:
mpirun -np 32 --bysocket --bycore ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi
--time_timestep_loop --animation_freq -1
I run above command, still do not improve. Would you give me a detailed command with options?
On Tue, 6/17/14, Ralph Castain <rhc_at_[hidden]> wrote:
Subject: Re: [OMPI users] how to get mpirun to scale from 16 to 64 cores
To: "Yuping Sun" <yupingpaulasun_at_[hidden]>, "Open MPI Users" <users_at_[hidden]>
Date: Tuesday, June 17, 2014, 1:59 AM
Well, for one, there
is never any guarantee of linear scaling with the number of
procs - that is very application dependent. You can actually
see performance decrease with number of procs if the
application doesn't know how to exploit them.
One thing that stands out is your mapping and
binding options. Mapping bysocket means that you are putting
neighboring ranks (i.e., ranks that differ by 1) on
different sockets, which usually means different NUMA
regions. This make shared memory between those procs run
poorly. IF the application does a lot of messaging between
ranks that differ by 1, then you would see poor
So one thing you could do is change --bysocket to
--bycore. Then, if your application isn't threaded, you
could --bind-to-core for better performance.
On Jun 16, 2014, at 3:19 PM, Yuping Sun <yupingpaulasun_at_[hidden]>
bought a 64 core workstation and installed NASA fun3d with
open mpi 1.6.5. Then I started to test run fun3d using 16,
32, 48 cores. However the performance of the fun3d run is
bad. I got data below:
run command is (it is for 32 core as an example)
-np 32 --bysocket --bind-to-socket
--time_timestep_loop --animation_freq -1 >
can see using 60 cores, to run 30 iteration, FUN3D will
complete in 678 seconds, roughly 22.61 second per
16 cores, to run 30 iteration, FUN3D will complete in 894
seconds, roughly 29.8 seconds per iteration.
data above shows FUN3D run using mpirun does not scale at
all! I used to run fun3d with mpirun on a 8 core WS, and it
same job to run on a linux cluster scales well.
you all give me some advice to improve the performance loss
increase the use of more cores, or how to run mpirun with
proper options to get a linear scaling when using 16 to 32
to 48 cores?
users mailing list
Link to this post: