Sorry for the delay in replying; my INBOX has become a disaster
recently. More below.
On Sep 14, 2009, at 5:08 AM, Sam Verboven wrote:
> Dear All,
> I'm having the following problem. If I execute the exact same
> application using both openmpi and mpich2, the former takes more than
> 2 times as long. When we compared the ganglia output we could see that
> openmpi generates more than 60 percent System CPU whereas mpich2 only
> has about 5, the remaining cycles all going to User CPU. This about
> explains the slowdown: when using openmpi we lose more than half the
> cycles to operating system overhead. We would very much like to know
> why our openmpi implementation incurs such a dramatic overhead.
> The only reason I could think of myself is the fact that we use
> bridged network interfaces on the cluster. Openmpi would not run
> properly until we specified the mca command to use the br0 interface
> instead of the physical eth0. Mpich2 does not require any extra
What did Open MPI did when you did not specify the use br0?
I assume that br0 is a combination of some other devices, like eth0
and eth1? If so, what happens if you "btl_tcp_if_include eth0,eth1"
instead of br0?
> The calculations themselves are done using fortran. The operating
> system is ubuntu 9.04, we have 14 dual quad core nodes and both
> openmpi and mpich2 are compiled from source without any configure
> Full command OpenMPI:
> mpirun.openmpi --mca btl_tcp_if_include br0 --prefix
> /usr/shares/mpi/openmpi -hostfile hostfile -np 224
> Full command Mpich2:
> mpiexec.mpich2 -machinefile machinefile -np 113
I notice that you're running almost 2x the number of processes for
Open MPI as MPICH2 -- does increasing the number of processes increase
the problem size, or have some other effect on overall run-time?