On Mon, Apr 06, 2009 at 02:04:16PM -0700, Eugene Loh wrote:
> Steve Kargl wrote:
> >I recently upgraded OpenMPI from 1.2.9 to 1.3 and then 1.3.1.
> >One of my colleagues reported a dramatic drop in performance
> >with one of his applications. My investigation shows a factor
> >of 10 drop in communication over the memory bus. I've placed
> >a figure that iilustrates the problem at
> >The legend in the figure has 'ver. 1.2.9 11 <--> 18'. This
> >means communication between node 11 and node 18 over GigE
> >ethernet in my cluster. 'ver. 1.2.9 20 <--> 20' means
> >communication between processes on node 20 where node 20 has
> >8 processors. The image clearly shows
> Not so clearly in my mind since I have trouble discriminating between
> the colors and the overlapping lines and so on. But I'll take your word
> for it that the plot illustrates the point you are reporting.
OK. I've removed the GigE results in the graph and plotted with
points as well as lines. You'll see a red line by itself. The
green and blue lines overlap. The original data is now
> It appears that you used to have just better than 1-usec latency (which
> is reasonable), but then it skyrocketed just over 10x with 1.3. I did
> some sm work, but that first appears in 1.3.2.
According to netpipe, I have
Sync Time: 0.000018241
Now starting main loop
Sync Time: 0.000001811
So, the latency has indeed gone up.
> The huge sm latencies are, so far as I know, inconsistent with
> everyone else's experience with 1.3. Is there any chance you
> could rebuild all three versions and really confirm that the
> observed difference can actually be attributed to differences
> in the OMPI source code? And/or run with "--mca btl
> self,sm" to make sure that the on-node message passing is indeed using sm?
The command lines I used are
/usr/local/openmpi-1.2.9/bin/mpicc -o z -O -static GetOpt.c netmpi.c
/usr/local/openmpi-1.2.9/bin/mpiexec -machinefile mf_ompi_2 -n 2 ./z
/usr/local/openmpi-1.3.1/bin/mpicc -o z -O -static GetOpt.c netmpi.c
/usr/local/openmpi-1.3.1/bin/mpiexec --mca btl self,sm -machinefile \
mf_ompi_2 -n 2 ./z
There is no change in the results as can be seen at
The machinefile contains the single line 'node20.cimu.org slots=2'.
I can rebuild 1.2.9 and 1.3.1. Is there any particular configure
options that I should enable/disable?