Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Bad parallel scaling using Code Saturne with openmpi
From: Dugenoux Albert (dugenouxa_at_[hidden])
Date: 2012-07-10 10:31:03


Hi.   I have recently built a cluster upon a Dell PowerEdge Server with a Debian 6.0 OS. This server is composed of 4 system board of 2 processors of hexacores. So it gives 12 cores per system board. The boards are linked with a local Gbits switch.   In order to parallelize the software Code Saturne, which is a CFD solver, I have configured the cluster such that there are a pbs server/mom on 1 system board and 3 mom and the 3 others cards. So this leads to 48 cores dispatched on 4 nodes of 12 CPU. Code saturne is compiled with the openmpi 1.6 version.   When I launch a simulation using 2 nodes with 12 cores, elapse time is good and network traffic is not full. But when I launch the same simulation using 3 nodes with 8 cores, elapse time is 5 times the previous one. I both cases, I use 24 cores and network seems not to be satured.   I have tested several configurations : binaries in local file system or on a NFS. But results are the same. I have visited severals forums (in particular http://www.open-mpi.org/community/lists/users/2009/08/10394.php) and read lots of threads, but as I am not an expert at clusters, I presently do not see where it is wrong !   Is it a problem in the configuration of PBS (I have installed it from the deb packages), a subtile compilation options of openMPI, or a bad network configuration ?   Regards.   B. S.