Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Factor of 10 loss in performance with 1.3.x
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-04-06 17:04:16


Steve Kargl wrote:

>I recently upgraded OpenMPI from 1.2.9 to 1.3 and then 1.3.1.
>One of my colleagues reported a dramatic drop in performance
>with one of his applications. My investigation shows a factor
>of 10 drop in communication over the memory bus. I've placed
>a figure that iilustrates the problem at
>
>http://troutmask.apl.washington.edu/~kargl/ompi_cmp.jpg
>
>The legend in the figure has 'ver. 1.2.9 11 <--> 18'. This
>means communication between node 11 and node 18 over GigE
>ethernet in my cluster. 'ver. 1.2.9 20 <--> 20' means
>communication between processes on node 20 where node 20 has
>8 processors. The image clearly shows
>
Not so clearly in my mind since I have trouble discriminating between
the colors and the overlapping lines and so on. But I'll take your word
for it that the plot illustrates the point you are reporting.

It appears that you used to have just better than 1-usec latency (which
is reasonable), but then it skyrocketed just over 10x with 1.3. I did
some sm work, but that first appears in 1.3.2. The huge sm latencies
are, so far as I know, inconsistent with everyone else's experience with
1.3. Is there any chance you could rebuild all three versions and
really confirm that the observed difference can actually be attributed
to differences in the OMPI source code? And/or run with "--mca btl
self,sm" to make sure that the on-node message passing is indeed using sm?

>that communication over
>GigE is consistent among the versions of OpenMPI. However, some
>change in going from 1.2.9 to 1.3.x is causing a drop in
>communication between processes on a single node.
>
>Things to note. Nodes 11, 18, and 20 are essentially idle
>before and after a test. configure was run with the same set
>of options except with 1.3 and 1.3.1 I needed to disable ipv6:
>
> ./configure --prefix=/usr/local/openmpi-1.2.9 \
> --enable-orterun-prefix-by-default --enable-static
> --disable-shared
>
> ./configure --prefix=/usr/local/openmpi-1.3.1 \
> --enable-orterun-prefix-by-default --enable-static
> --disable-shared --disable-ipv6
>
> ./configure --prefix=/usr/local/openmpi-1.3.1 \
> --enable-orterun-prefix-by-default --enable-static
> --disable-shared --disable-ipv6
>
>The operating system is FreeBSD 8.0 where nodes 18 and 20
>are quad-core, dual-cpu opteron based systems and node 11
>is a dual-core, dual-cpu opteron based system. For additional
>information, I've placed the output of ompi_info at
>
>http://troutmask.apl.washington.edu/~kargl/ompi_info-1.2.9
>http://troutmask.apl.washington.edu/~kargl/ompi_info-1.3.0
>http://troutmask.apl.washington.edu/~kargl/ompi_info-1.3.1
>
>Any hints on tuning 1.3.1 would be appreciated?
>
>