Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI much slower than Mpich2
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-09-17 10:23:05

Sorry for the delay in replying; my INBOX has become a disaster
recently. More below.

On Sep 14, 2009, at 5:08 AM, Sam Verboven wrote:

> Dear All,
> I'm having the following problem. If I execute the exact same
> application using both openmpi and mpich2, the former takes more than
> 2 times as long. When we compared the ganglia output we could see that
> openmpi generates more than 60 percent System CPU whereas mpich2 only
> has about 5, the remaining cycles all going to User CPU. This about
> explains the slowdown: when using openmpi we lose more than half the
> cycles to operating system overhead. We would very much like to know
> why our openmpi implementation incurs such a dramatic overhead.
> The only reason I could think of myself is the fact that we use
> bridged network interfaces on the cluster. Openmpi would not run
> properly until we specified the mca command to use the br0 interface
> instead of the physical eth0. Mpich2 does not require any extra
> parameters.

What did Open MPI did when you did not specify the use br0?

I assume that br0 is a combination of some other devices, like eth0
and eth1? If so, what happens if you "btl_tcp_if_include eth0,eth1"
instead of br0?

> The calculations themselves are done using fortran. The operating
> system is ubuntu 9.04, we have 14 dual quad core nodes and both
> openmpi and mpich2 are compiled from source without any configure
> options.
> Full command OpenMPI:
> mpirun.openmpi --mca btl_tcp_if_include br0 --prefix
> /usr/shares/mpi/openmpi -hostfile hostfile -np 224
> /home/arickx/bin/Linux/F_me_Kl1l2_3cl_mpi_2
> Full command Mpich2:
> mpiexec.mpich2 -machinefile machinefile -np 113
> /home/arickx/bin/Linux/F_me_Kl1l2_3cl_mpi_2

I notice that you're running almost 2x the number of processes for
Open MPI as MPICH2 -- does increasing the number of processes increase
the problem size, or have some other effect on overall run-time?

Jeff Squyres