One thing to look for is the process distribution. Based on the
application communication pattern, the process distribution can have a
tremendous impact on the execution time. Imagine that the application
split the processes in two equal groups based on the rank and only
communicate in each group. If such a group end-up on the same node,
then it will use sm for communications. On the opposite, if they end-
up spread across the nodes they will use TCP (which obviously has a
bigger latency and a smaller bandwidth) and the overall performance
will be greatly impacted.
By default, Open MPI use the following strategy to distribute
processes: if a node has several processors, then consecutive ranks
will be started on the same node. As an example in your case (2 nodes
with 4 processors each), the ranks 0-3 will be started on the first
host, while the ranks 4-7 on the second one. I don't know what is the
default distribution for MPICH2 ...
Anyway, there is a easy way to check if the process distribution is
the root of your problem. Please execute your application twice, once
providing to mpirun the --bynode argument, and once with the --byslot.
On Oct 8, 2008, at 9:10 AM, Sangamesh B wrote:
> Hi All,
> I wanted to switch from mpich2/mvapich2 to OpenMPI, as
> OpenMPI supports both ethernet and infiniband. Before doing that I
> tested an application 'GROMACS' to compare the performance of MPICH2
> & OpenMPI. Both have been compiled with GNU compilers.
> After this benchmark, I came to know that OpenMPI is slower than
> This benchmark is run on a AMD dual core, dual opteron processor.
> Both have compiled with default configurations.
> The job is run on 2 nodes - 8 cores.
> OpenMPI - 25 m 39 s.
> MPICH2 - 15 m 53 s.
> Any comments ..?
> users mailing list