Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Fwd: [OMPI bugs] [Open MPI] #1250: Performance problem on SM
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-07-23 09:39:20


Short version: I'm seeing a large performance drop between r18850 and
the SVN HEAD.

Longer version:

FWIW, I ran the tests on 3 versions on a woodcrest-class x86_64
machine running RHEL4U4:

* Trunk HEAD (r18997)
* r18973 --> had to patch the cpu64* thingy in openib btl to get it to
compile
* r18850

I ran both osu_latency and NetPIPE 3.7.1. In the r18997 and r18973,
the latency for short sends over sm is *significantly* higher than
that of r18850. Detailed results below.

================================================================
r18997

[6:27] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % mpirun --mca
mpi_paffinity_alone 1 -np 2 --mca btl sm,self NPmpi
0: svbu-mpi052
1: svbu-mpi052
Now starting the main loop
   0: 1 bytes 85423 times --> 8.23 Mbps in 0.93 usec
   1: 2 bytes 107852 times --> 16.46 Mbps in 0.93 usec
   2: 3 bytes 107874 times --> 24.65 Mbps in 0.93 usec
   3: 4 bytes 71801 times --> 30.36 Mbps in 1.01 usec
   4: 6 bytes 74610 times --> 45.27 Mbps in 1.01 usec
   5: 8 bytes 49448 times --> 60.59 Mbps in 1.01 usec
   6: 12 bytes 62044 times --> 90.72 Mbps in 1.01 usec
   7: 13 bytes 41287 times --> 98.58 Mbps in 1.01 usec
   8: 16 bytes 45872 times --> 120.81 Mbps in 1.01 usec
   9: 19 bytes 55670 times --> 143.78 Mbps in 1.01 usec
  10: 21 bytes 62644 times --> 156.63 Mbps in 1.02 usec
  11: 24 bytes 65172 times --> 177.63 Mbps in 1.03 usec
  12: 27 bytes 68714 times --> 187.21 Mbps in 1.10 usec
  13: 29 bytes 40392 times --> 201.05 Mbps in 1.10 usec
  14: 32 bytes 43868 times --> 220.92 Mbps in 1.11 usec
  15: 35 bytes 48072 times --> 255.73 Mbps in 1.04 usec
  16: 45 bytes 54725 times --> 308.90 Mbps in 1.11 usec
  17: 48 bytes 59983 times --> 329.04 Mbps in 1.11 usec
  18: 51 bytes 61772 times --> 348.53 Mbps in 1.12 usec
  19: 61 bytes 35126 times --> 408.86 Mbps in 1.14 usec
  20: 64 bytes 43206 times --> 453.67 Mbps in 1.08 usec
  21: 67 bytes 47907 times --> 487.77 Mbps in 1.05 usec
  22: 93 bytes 51271 times --> 561.32 Mbps in 1.26 usec
  23: 96 bytes 52741 times --> 595.08 Mbps in 1.23 usec
  24: 99 bytes 55012 times --> 617.64 Mbps in 1.22 usec
  25: 125 bytes 29735 times --> 736.44 Mbps in 1.29 usec
  26: 128 bytes 38301 times --> 779.33 Mbps in 1.25 usec
  27: 131 bytes 40525 times --> 818.32 Mbps in 1.22 usec
  28: 189 bytes 42501 times --> 1007.67 Mbps in 1.43 usec
  29: 192 bytes 46588 times --> 1084.13 Mbps in 1.35 usec
  30: 195 bytes 49725 times --> 1128.97 Mbps in 1.32 usec
  31: 253 bytes 26462 times --> 1257.97 Mbps in 1.53 usec
  32: 256 bytes 32457 times --> 1304.17 Mbps in 1.50 usec
  33: 259 bytes 33647 times --> 1354.14 Mbps in 1.46 usec
  34: 381 bytes 34925 times --> 1616.43 Mbps in 1.80 usec
  35: 384 bytes 37072 times --> 1676.92 Mbps in 1.75 usec
  36: 387 bytes 38308 times --> 1724.50 Mbps in 1.71 usec
  37: 509 bytes 19921 times --> 1908.30 Mbps in 2.03 usec
  38: 512 bytes 24521 times --> 2013.16 Mbps in 1.94 usec
  39: 515 bytes 25869 times --> 2038.18 Mbps in 1.93 usec
  40: 765 bytes 26188 times --> 2474.81 Mbps in 2.36 usec
  41: 768 bytes 28268 times --> 2513.00 Mbps in 2.33 usec
  42: 771 bytes 28648 times --> 2531.45 Mbps in 2.32 usec
  43: 1021 bytes 14512 times --> 2831.70 Mbps in 2.75 usec
  44: 1024 bytes 18158 times --> 2853.94 Mbps in 2.74 usec
  45: 1027 bytes 18300 times --> 2872.58 Mbps in 2.73 usec
  46: 1533 bytes 18420 times --> 3298.65 Mbps in 3.55 usec
  47: 1536 bytes 18802 times --> 3320.86 Mbps in 3.53 usec
  48: 1539 bytes 18910 times --> 3351.99 Mbps in 3.50 usec
  49: 2045 bytes 9571 times --> 3599.21 Mbps in 4.33 usec
  50: 2048 bytes 11528 times --> 3640.91 Mbps in 4.29 usec
  51: 2051 bytes 11662 times --> 3638.62 Mbps in 4.30 usec
  52: 3069 bytes 11654 times --> 3905.17 Mbps in 6.00 usec
  53: 3072 bytes 11118 times --> 3917.67 Mbps in 5.98 usec
  54: 3075 bytes 11149 times --> 3973.53 Mbps in 5.90 usec
  55: 4093 bytes 5662 times --> 4450.80 Mbps in 7.02 usec
  56: 4096 bytes 7124 times --> 4445.17 Mbps in 7.03 usec
  57: 4099 bytes 7115 times --> 4412.88 Mbps in 7.09 usec
  58: 6141 bytes 7064 times --> 4962.74 Mbps in 9.44 usec
  59: 6144 bytes 7061 times --> 4941.94 Mbps in 9.49 usec
  60: 6147 bytes 7030 times --> 4938.46 Mbps in 9.50 usec
  61: 8189 bytes 3515 times --> 5263.65 Mbps in 11.87 usec
  62: 8192 bytes 4211 times --> 5249.31 Mbps in 11.91 usec
  63: 8195 bytes 4200 times --> 5202.08 Mbps in 12.02 usec
  64: 12285 bytes 4162 times --> 6380.89 Mbps in 14.69 usec
  65: 12288 bytes 4538 times --> 6385.27 Mbps in 14.68 usec
  66: 12291 bytes 4541 times --> 6335.05 Mbps in 14.80 usec
  67: 16381 bytes 2253 times --> 6535.76 Mbps in 19.12 usec
  68: 16384 bytes 2614 times --> 6537.24 Mbps in 19.12 usec
  69: 16387 bytes 2615 times --> 6514.52 Mbps in 19.19 usec
  70: 24573 bytes 2606 times --> 6870.51 Mbps in 27.29 usec
  71: 24576 bytes 2443 times --> 6866.57 Mbps in 27.31 usec
  72: 24579 bytes 2441 times --> 6864.32 Mbps in 27.32 usec
  73: 32765 bytes 1220 times --> 7124.85 Mbps in 35.09 usec
  74: 32768 bytes 1425 times --> 7120.30 Mbps in 35.11 usec
  75: 32771 bytes 1424 times --> 7127.15 Mbps in 35.08 usec
  76: 49149 bytes 1425 times --> 8313.31 Mbps in 45.11 usec
  77: 49152 bytes 1478 times --> 8312.58 Mbps in 45.11 usec
  78: 49155 bytes 1477 times --> 8309.34 Mbps in 45.13 usec
  79: 65533 bytes 738 times --> 8219.82 Mbps in 60.83 usec
  80: 65536 bytes 822 times --> 8209.24 Mbps in 60.91 usec
  81: 65539 bytes 820 times --> 8216.00 Mbps in 60.86 usec
  82: 98301 bytes 821 times --> 8698.24 Mbps in 86.22 usec
  83: 98304 bytes 773 times --> 8695.03 Mbps in 86.26 usec
  84: 98307 bytes 772 times --> 8696.95 Mbps in 86.24 usec
  85: 131069 bytes 386 times --> 8916.50 Mbps in 112.15 usec
  86: 131072 bytes 445 times --> 8917.29 Mbps in 112.14 usec
  87: 131075 bytes 445 times --> 8916.62 Mbps in 112.15 usec
  88: 196605 bytes 445 times --> 9205.17 Mbps in 162.95 usec
  89: 196608 bytes 409 times --> 9195.75 Mbps in 163.12 usec
  90: 196611 bytes 408 times --> 9203.02 Mbps in 162.99 usec
  91: 262141 bytes 204 times --> 9338.32 Mbps in 214.17 usec
  92: 262144 bytes 233 times --> 9350.57 Mbps in 213.89 usec
  93: 262147 bytes 233 times --> 9336.72 Mbps in 214.21 usec
  94: 393213 bytes 233 times --> 9480.21 Mbps in 316.45 usec
  95: 393216 bytes 210 times --> 9476.10 Mbps in 316.59 usec
  96: 393219 bytes 210 times --> 9471.25 Mbps in 316.75 usec
  97: 524285 bytes 105 times --> 9523.20 Mbps in 420.02 usec
  98: 524288 bytes 119 times --> 9519.53 Mbps in 420.19 usec
  99: 524291 bytes 118 times --> 9523.09 Mbps in 420.03 usec
100: 786429 bytes 119 times --> 9555.83 Mbps in 627.89 usec
101: 786432 bytes 106 times --> 9542.67 Mbps in 628.75 usec
102: 786435 bytes 106 times --> 9554.47 Mbps in 627.98 usec
103: 1048573 bytes 53 times --> 9527.96 Mbps in 839.63 usec
104: 1048576 bytes 59 times --> 9530.63 Mbps in 839.40 usec
105: 1048579 bytes 59 times --> 9500.65 Mbps in 842.05 usec
106: 1572861 bytes 59 times --> 9389.53 Mbps in 1278.02 usec
107: 1572864 bytes 52 times --> 9396.87 Mbps in 1277.02 usec
108: 1572867 bytes 52 times --> 9375.01 Mbps in 1280.00 usec
109: 2097149 bytes 26 times --> 9271.33 Mbps in 1725.75 usec
110: 2097152 bytes 28 times --> 9273.64 Mbps in 1725.32 usec
111: 2097155 bytes 28 times --> 9281.42 Mbps in 1723.88 usec
112: 3145725 bytes 29 times --> 9109.93 Mbps in 2634.48 usec
113: 3145728 bytes 25 times --> 9128.80 Mbps in 2629.04 usec
114: 3145731 bytes 25 times --> 9099.66 Mbps in 2637.46 usec
115: 4194301 bytes 12 times --> 8840.19 Mbps in 3619.83 usec
116: 4194304 bytes 13 times --> 8847.10 Mbps in 3617.00 usec
117: 4194307 bytes 13 times --> 8827.22 Mbps in 3625.15 usec
118: 6291453 bytes 13 times --> 8351.40 Mbps in 5747.54 usec
119: 6291456 bytes 11 times --> 8345.46 Mbps in 5751.63 usec
120: 6291459 bytes 11 times --> 8343.42 Mbps in 5753.04 usec
121: 8388605 bytes 5 times --> 8166.28 Mbps in 7837.10 usec
122: 8388608 bytes 6 times --> 8166.91 Mbps in 7836.50 usec
123: 8388611 bytes 6 times --> 8162.67 Mbps in 7840.57 usec
[6:29] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % cd ../osu/
[6:29] svbu-mpi052:~/svn/ompi-tests/osu % mpirun --mca
mpi_paffinity_alone 1 -np 2 --mca btl sm,self osu_latency
# OSU MPI Latency Test (Version 2.1)
# Size Latency (us)
0 0.85
1 0.91
2 0.91
4 0.99
8 0.99
16 0.99
32 1.08
64 1.08
128 1.25
256 1.49
512 1.92
1024 2.71
2048 4.40
4096 6.85
8192 11.48
16384 19.25
32768 35.25
65536 61.03
131072 113.15
262144 215.54
524288 428.19
1048576 880.72
2097152 1839.12
4194304 3934.90
[6:29] svbu-mpi052:~/svn/ompi-tests/osu %

================================================================
r18973

[6:36] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % mpirun --mca
mpi_paffinity_alone 1 -np 2 --mca btl sm,self NPmpi
1: svbu-mpi052
0: svbu-mpi052
Now starting the main loop
   0: 1 bytes 84392 times --> 8.29 Mbps in 0.92 usec
   1: 2 bytes 108626 times --> 16.58 Mbps in 0.92 usec
   2: 3 bytes 108657 times --> 24.91 Mbps in 0.92 usec
   3: 4 bytes 72561 times --> 30.33 Mbps in 1.01 usec
   4: 6 bytes 74529 times --> 45.51 Mbps in 1.01 usec
   5: 8 bytes 49709 times --> 60.76 Mbps in 1.00 usec
   6: 12 bytes 62222 times --> 90.84 Mbps in 1.01 usec
   7: 13 bytes 41344 times --> 98.58 Mbps in 1.01 usec
   8: 16 bytes 45875 times --> 121.19 Mbps in 1.01 usec
   9: 19 bytes 55845 times --> 143.43 Mbps in 1.01 usec
  10: 21 bytes 62491 times --> 156.66 Mbps in 1.02 usec
  11: 24 bytes 65185 times --> 177.87 Mbps in 1.03 usec
  12: 27 bytes 68806 times --> 187.63 Mbps in 1.10 usec
  13: 29 bytes 40482 times --> 202.10 Mbps in 1.09 usec
  14: 32 bytes 44096 times --> 222.11 Mbps in 1.10 usec
  15: 35 bytes 48331 times --> 255.12 Mbps in 1.05 usec
  16: 45 bytes 54593 times --> 308.42 Mbps in 1.11 usec
  17: 48 bytes 59888 times --> 330.10 Mbps in 1.11 usec
  18: 51 bytes 61970 times --> 348.31 Mbps in 1.12 usec
  19: 61 bytes 35104 times --> 409.39 Mbps in 1.14 usec
  20: 64 bytes 43261 times --> 451.69 Mbps in 1.08 usec
  21: 67 bytes 47698 times --> 489.98 Mbps in 1.04 usec
  22: 93 bytes 51504 times --> 565.69 Mbps in 1.25 usec
  23: 96 bytes 53150 times --> 598.55 Mbps in 1.22 usec
  24: 99 bytes 55333 times --> 623.24 Mbps in 1.21 usec
  25: 125 bytes 30005 times --> 735.91 Mbps in 1.30 usec
  26: 128 bytes 38274 times --> 781.32 Mbps in 1.25 usec
  27: 131 bytes 40628 times --> 828.90 Mbps in 1.21 usec
  28: 189 bytes 43050 times --> 1018.02 Mbps in 1.42 usec
  29: 192 bytes 47066 times --> 1069.01 Mbps in 1.37 usec
  30: 195 bytes 49032 times --> 1122.18 Mbps in 1.33 usec
  31: 253 bytes 26303 times --> 1259.95 Mbps in 1.53 usec
  32: 256 bytes 32508 times --> 1307.53 Mbps in 1.49 usec
  33: 259 bytes 33734 times --> 1357.47 Mbps in 1.46 usec
  34: 381 bytes 35011 times --> 1617.08 Mbps in 1.80 usec
  35: 384 bytes 37087 times --> 1675.72 Mbps in 1.75 usec
  36: 387 bytes 38280 times --> 1722.27 Mbps in 1.71 usec
  37: 509 bytes 19895 times --> 1913.58 Mbps in 2.03 usec
  38: 512 bytes 24589 times --> 1967.08 Mbps in 1.99 usec
  39: 515 bytes 25276 times --> 2041.10 Mbps in 1.93 usec
  40: 765 bytes 26226 times --> 2448.96 Mbps in 2.38 usec
  41: 768 bytes 27973 times --> 2503.60 Mbps in 2.34 usec
  42: 771 bytes 28541 times --> 2541.12 Mbps in 2.31 usec
  43: 1021 bytes 14567 times --> 2845.46 Mbps in 2.74 usec
  44: 1024 bytes 18246 times --> 2854.45 Mbps in 2.74 usec
  45: 1027 bytes 18304 times --> 2939.64 Mbps in 2.67 usec
  46: 1533 bytes 18850 times --> 3291.70 Mbps in 3.55 usec
  47: 1536 bytes 18762 times --> 3310.45 Mbps in 3.54 usec
  48: 1539 bytes 18851 times --> 3386.68 Mbps in 3.47 usec
  49: 2045 bytes 9670 times --> 3635.22 Mbps in 4.29 usec
  50: 2048 bytes 11644 times --> 3646.70 Mbps in 4.28 usec
  51: 2051 bytes 11680 times --> 3640.09 Mbps in 4.30 usec
  52: 3069 bytes 11659 times --> 3926.68 Mbps in 5.96 usec
  53: 3072 bytes 11180 times --> 3962.33 Mbps in 5.92 usec
  54: 3075 bytes 11276 times --> 3978.54 Mbps in 5.90 usec
  55: 4093 bytes 5669 times --> 4398.66 Mbps in 7.10 usec
  56: 4096 bytes 7041 times --> 4429.95 Mbps in 7.05 usec
  57: 4099 bytes 7091 times --> 4378.99 Mbps in 7.14 usec
  58: 6141 bytes 7009 times --> 5001.17 Mbps in 9.37 usec
  59: 6144 bytes 7116 times --> 4984.01 Mbps in 9.41 usec
  60: 6147 bytes 7090 times --> 5015.48 Mbps in 9.35 usec
  61: 8189 bytes 3570 times --> 5286.90 Mbps in 11.82 usec
  62: 8192 bytes 4230 times --> 5222.58 Mbps in 11.97 usec
  63: 8195 bytes 4179 times --> 5261.91 Mbps in 11.88 usec
  64: 12285 bytes 4210 times --> 6370.90 Mbps in 14.71 usec
  65: 12288 bytes 4531 times --> 6376.57 Mbps in 14.70 usec
  66: 12291 bytes 4535 times --> 6349.10 Mbps in 14.77 usec
  67: 16381 bytes 2258 times --> 6521.57 Mbps in 19.16 usec
  68: 16384 bytes 2608 times --> 6520.25 Mbps in 19.17 usec
  69: 16387 bytes 2608 times --> 6504.81 Mbps in 19.22 usec
  70: 24573 bytes 2602 times --> 6867.93 Mbps in 27.30 usec
  71: 24576 bytes 2442 times --> 6869.27 Mbps in 27.30 usec
  72: 24579 bytes 2442 times --> 6864.04 Mbps in 27.32 usec
  73: 32765 bytes 1220 times --> 7118.03 Mbps in 35.12 usec
  74: 32768 bytes 1423 times --> 7117.77 Mbps in 35.12 usec
  75: 32771 bytes 1423 times --> 7120.85 Mbps in 35.11 usec
  76: 49149 bytes 1424 times --> 8324.26 Mbps in 45.05 usec
  77: 49152 bytes 1479 times --> 8328.77 Mbps in 45.02 usec
  78: 49155 bytes 1480 times --> 8320.47 Mbps in 45.07 usec
  79: 65533 bytes 739 times --> 8214.38 Mbps in 60.87 usec
  80: 65536 bytes 821 times --> 8219.87 Mbps in 60.83 usec
  81: 65539 bytes 822 times --> 8232.40 Mbps in 60.74 usec
  82: 98301 bytes 823 times --> 8717.21 Mbps in 86.03 usec
  83: 98304 bytes 774 times --> 8716.08 Mbps in 86.05 usec
  84: 98307 bytes 774 times --> 8714.26 Mbps in 86.07 usec
  85: 131069 bytes 387 times --> 8921.59 Mbps in 112.09 usec
  86: 131072 bytes 446 times --> 8935.37 Mbps in 111.91 usec
  87: 131075 bytes 446 times --> 8925.47 Mbps in 112.04 usec
  88: 196605 bytes 446 times --> 9195.80 Mbps in 163.12 usec
  89: 196608 bytes 408 times --> 9197.41 Mbps in 163.09 usec
  90: 196611 bytes 408 times --> 9204.33 Mbps in 162.97 usec
  91: 262141 bytes 204 times --> 9344.95 Mbps in 214.02 usec
  92: 262144 bytes 233 times --> 9347.58 Mbps in 213.96 usec
  93: 262147 bytes 233 times --> 9340.56 Mbps in 214.12 usec
  94: 393213 bytes 233 times --> 9473.27 Mbps in 316.68 usec
  95: 393216 bytes 210 times --> 9486.24 Mbps in 316.25 usec
  96: 393219 bytes 210 times --> 9500.26 Mbps in 315.78 usec
  97: 524285 bytes 105 times --> 9538.88 Mbps in 419.33 usec
  98: 524288 bytes 119 times --> 9543.40 Mbps in 419.14 usec
  99: 524291 bytes 119 times --> 9534.73 Mbps in 419.52 usec
100: 786429 bytes 119 times --> 9574.15 Mbps in 626.69 usec
101: 786432 bytes 106 times --> 9565.70 Mbps in 627.24 usec
102: 786435 bytes 106 times --> 9544.50 Mbps in 628.64 usec
103: 1048573 bytes 53 times --> 9530.85 Mbps in 839.38 usec
104: 1048576 bytes 59 times --> 9525.24 Mbps in 839.87 usec
105: 1048579 bytes 59 times --> 9511.86 Mbps in 841.06 usec
106: 1572861 bytes 59 times --> 9391.40 Mbps in 1277.76 usec
107: 1572864 bytes 52 times --> 9395.54 Mbps in 1277.20 usec
108: 1572867 bytes 52 times --> 9386.02 Mbps in 1278.50 usec
109: 2097149 bytes 26 times --> 9298.48 Mbps in 1720.71 usec
110: 2097152 bytes 29 times --> 9313.43 Mbps in 1717.95 usec
111: 2097155 bytes 29 times --> 9293.49 Mbps in 1721.64 usec
112: 3145725 bytes 29 times --> 9126.67 Mbps in 2629.65 usec
113: 3145728 bytes 25 times --> 9113.76 Mbps in 2633.38 usec
114: 3145731 bytes 25 times --> 9079.90 Mbps in 2643.20 usec
115: 4194301 bytes 12 times --> 8810.57 Mbps in 3632.00 usec
116: 4194304 bytes 13 times --> 8821.99 Mbps in 3627.30 usec
117: 4194307 bytes 13 times --> 8801.17 Mbps in 3635.88 usec
118: 6291453 bytes 13 times --> 8337.50 Mbps in 5757.12 usec
119: 6291456 bytes 11 times --> 8332.94 Mbps in 5760.27 usec
120: 6291459 bytes 11 times --> 8346.25 Mbps in 5751.09 usec
121: 8388605 bytes 5 times --> 8159.20 Mbps in 7843.90 usec
122: 8388608 bytes 6 times --> 8166.83 Mbps in 7836.58 usec
123: 8388611 bytes 6 times --> 8161.26 Mbps in 7841.92 usec
[6:37] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % cd ../osu/
[6:37] svbu-mpi052:~/svn/ompi-tests/osu % mpirun --mca
mpi_paffinity_alone 1 -np 2 --mca btl sm,self osu_latency
# OSU MPI Latency Test (Version 2.1)
# Size Latency (us)
0 0.85
1 0.91
2 0.91
4 0.99
8 0.99
16 0.99
32 1.09
64 1.07
128 1.25
256 1.49
512 1.97
1024 2.69
2048 4.29
4096 6.83
8192 11.41
16384 19.69
32768 35.27
65536 61.06
131072 112.51
262144 215.47
524288 429.60
1048576 882.89
2097152 1836.45
4194304 3943.47
[6:37] svbu-mpi052:~/svn/ompi-tests/osu %

================================================================
r18850
[6:31] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % mpirun --mca
mpi_paffinity_alone 1 -np 2 --mca btl sm,self NPmpi
0: svbu-mpi052
1: svbu-mpi052
Now starting the main loop
   0: 1 bytes 116185 times --> 11.32 Mbps in 0.67 usec
   1: 2 bytes 148348 times --> 22.58 Mbps in 0.68 usec
   2: 3 bytes 147969 times --> 33.88 Mbps in 0.68 usec
   3: 4 bytes 98695 times --> 40.58 Mbps in 0.75 usec
   4: 6 bytes 99737 times --> 60.85 Mbps in 0.75 usec
   5: 8 bytes 66464 times --> 81.13 Mbps in 0.75 usec
   6: 12 bytes 83076 times --> 121.58 Mbps in 0.75 usec
   7: 13 bytes 55334 times --> 131.83 Mbps in 0.75 usec
   8: 16 bytes 61344 times --> 161.81 Mbps in 0.75 usec
   9: 19 bytes 74561 times --> 190.93 Mbps in 0.76 usec
  10: 21 bytes 83186 times --> 207.97 Mbps in 0.77 usec
  11: 24 bytes 86535 times --> 235.30 Mbps in 0.78 usec
  12: 27 bytes 91024 times --> 241.36 Mbps in 0.85 usec
  13: 29 bytes 52074 times --> 260.24 Mbps in 0.85 usec
  14: 32 bytes 56782 times --> 286.57 Mbps in 0.85 usec
  15: 35 bytes 62357 times --> 341.55 Mbps in 0.78 usec
  16: 45 bytes 73090 times --> 400.53 Mbps in 0.86 usec
  17: 48 bytes 77776 times --> 425.94 Mbps in 0.86 usec
  18: 51 bytes 79963 times --> 449.27 Mbps in 0.87 usec
  19: 61 bytes 45280 times --> 520.58 Mbps in 0.89 usec
  20: 64 bytes 55011 times --> 589.77 Mbps in 0.83 usec
  21: 67 bytes 62279 times --> 651.96 Mbps in 0.78 usec
  22: 93 bytes 68530 times --> 706.75 Mbps in 1.00 usec
  23: 96 bytes 66405 times --> 756.56 Mbps in 0.97 usec
  24: 99 bytes 69940 times --> 786.11 Mbps in 0.96 usec
  25: 125 bytes 37846 times --> 917.31 Mbps in 1.04 usec
  26: 128 bytes 47708 times --> 991.21 Mbps in 0.99 usec
  27: 131 bytes 51542 times --> 1030.40 Mbps in 0.97 usec
  28: 189 bytes 53515 times --> 1228.14 Mbps in 1.17 usec
  29: 192 bytes 56781 times --> 1317.94 Mbps in 1.11 usec
  30: 195 bytes 60449 times --> 1372.28 Mbps in 1.08 usec
  31: 253 bytes 32165 times --> 1506.60 Mbps in 1.28 usec
  32: 256 bytes 38871 times --> 1590.08 Mbps in 1.23 usec
  33: 259 bytes 41024 times --> 1657.90 Mbps in 1.19 usec
  34: 381 bytes 42760 times --> 1894.98 Mbps in 1.53 usec
  35: 384 bytes 43460 times --> 1958.92 Mbps in 1.50 usec
  36: 387 bytes 44750 times --> 2029.44 Mbps in 1.45 usec
  37: 509 bytes 23444 times --> 2176.96 Mbps in 1.78 usec
  38: 512 bytes 27974 times --> 2268.97 Mbps in 1.72 usec
  39: 515 bytes 29156 times --> 2340.62 Mbps in 1.68 usec
  40: 765 bytes 30074 times --> 2698.17 Mbps in 2.16 usec
  41: 768 bytes 30819 times --> 2778.48 Mbps in 2.11 usec
  42: 771 bytes 31674 times --> 2847.11 Mbps in 2.07 usec
  43: 1021 bytes 16322 times --> 3039.90 Mbps in 2.56 usec
  44: 1024 bytes 19493 times --> 3161.06 Mbps in 2.47 usec
  45: 1027 bytes 20270 times --> 3221.90 Mbps in 2.43 usec
  46: 1533 bytes 20660 times --> 3455.95 Mbps in 3.38 usec
  47: 1536 bytes 19698 times --> 3580.63 Mbps in 3.27 usec
  48: 1539 bytes 20389 times --> 3623.40 Mbps in 3.24 usec
  49: 2045 bytes 10346 times --> 3751.80 Mbps in 4.16 usec
  50: 2048 bytes 12017 times --> 3833.40 Mbps in 4.08 usec
  51: 2051 bytes 12278 times --> 3813.67 Mbps in 4.10 usec
  52: 3069 bytes 12215 times --> 3997.25 Mbps in 5.86 usec
  53: 3072 bytes 11381 times --> 4058.18 Mbps in 5.78 usec
  54: 3075 bytes 11548 times --> 4102.09 Mbps in 5.72 usec
  55: 4093 bytes 5845 times --> 4726.24 Mbps in 6.61 usec
  56: 4096 bytes 7565 times --> 4679.74 Mbps in 6.68 usec
  57: 4099 bytes 7491 times --> 4649.50 Mbps in 6.73 usec
  58: 6141 bytes 7442 times --> 5072.39 Mbps in 9.24 usec
  59: 6144 bytes 7217 times --> 5064.70 Mbps in 9.26 usec
  60: 6147 bytes 7204 times --> 5067.07 Mbps in 9.26 usec
  61: 8189 bytes 3606 times --> 5387.85 Mbps in 11.60 usec
  62: 8192 bytes 4311 times --> 5393.87 Mbps in 11.59 usec
  63: 8195 bytes 4316 times --> 5301.81 Mbps in 11.79 usec
  64: 12285 bytes 4242 times --> 6568.81 Mbps in 14.27 usec
  65: 12288 bytes 4672 times --> 6561.90 Mbps in 14.29 usec
  66: 12291 bytes 4666 times --> 6548.01 Mbps in 14.32 usec
  67: 16381 bytes 2329 times --> 6662.43 Mbps in 18.76 usec
  68: 16384 bytes 2665 times --> 6655.18 Mbps in 18.78 usec
  69: 16387 bytes 2662 times --> 6634.79 Mbps in 18.84 usec
  70: 24573 bytes 2654 times --> 6937.26 Mbps in 27.02 usec
  71: 24576 bytes 2466 times --> 6937.41 Mbps in 27.03 usec
  72: 24579 bytes 2466 times --> 6931.40 Mbps in 27.05 usec
  73: 32765 bytes 1232 times --> 7218.55 Mbps in 34.63 usec
  74: 32768 bytes 1443 times --> 7213.85 Mbps in 34.66 usec
  75: 32771 bytes 1442 times --> 7218.89 Mbps in 34.63 usec
  76: 49149 bytes 1443 times --> 8387.79 Mbps in 44.71 usec
  77: 49152 bytes 1491 times --> 8385.50 Mbps in 44.72 usec
  78: 49155 bytes 1490 times --> 8390.79 Mbps in 44.69 usec
  79: 65533 bytes 745 times --> 8261.32 Mbps in 60.52 usec
  80: 65536 bytes 826 times --> 8260.34 Mbps in 60.53 usec
  81: 65539 bytes 826 times --> 8265.33 Mbps in 60.50 usec
  82: 98301 bytes 826 times --> 8747.13 Mbps in 85.74 usec
  83: 98304 bytes 777 times --> 8746.72 Mbps in 85.75 usec
  84: 98307 bytes 777 times --> 8733.81 Mbps in 85.88 usec
  85: 131069 bytes 388 times --> 8956.71 Mbps in 111.65 usec
  86: 131072 bytes 447 times --> 8967.16 Mbps in 111.52 usec
  87: 131075 bytes 448 times --> 8960.56 Mbps in 111.60 usec
  88: 196605 bytes 448 times --> 9247.58 Mbps in 162.20 usec
  89: 196608 bytes 411 times --> 9234.30 Mbps in 162.44 usec
  90: 196611 bytes 410 times --> 9231.32 Mbps in 162.49 usec
  91: 262141 bytes 205 times --> 9365.98 Mbps in 213.54 usec
  92: 262144 bytes 234 times --> 9368.25 Mbps in 213.49 usec
  93: 262147 bytes 234 times --> 9363.09 Mbps in 213.61 usec
  94: 393213 bytes 234 times --> 9512.63 Mbps in 315.37 usec
  95: 393216 bytes 211 times --> 9497.01 Mbps in 315.89 usec
  96: 393219 bytes 211 times --> 9510.80 Mbps in 315.43 usec
  97: 524285 bytes 105 times --> 9553.55 Mbps in 418.69 usec
  98: 524288 bytes 119 times --> 9561.59 Mbps in 418.34 usec
  99: 524291 bytes 119 times --> 9551.86 Mbps in 418.77 usec
100: 786429 bytes 119 times --> 9582.63 Mbps in 626.13 usec
101: 786432 bytes 106 times --> 9576.72 Mbps in 626.52 usec
102: 786435 bytes 106 times --> 9584.78 Mbps in 625.99 usec
103: 1048573 bytes 53 times --> 9545.32 Mbps in 838.10 usec
104: 1048576 bytes 59 times --> 9532.37 Mbps in 839.25 usec
105: 1048579 bytes 59 times --> 9542.90 Mbps in 838.32 usec
106: 1572861 bytes 59 times --> 9434.44 Mbps in 1271.93 usec
107: 1572864 bytes 52 times --> 9400.64 Mbps in 1276.51 usec
108: 1572867 bytes 52 times --> 9409.24 Mbps in 1275.34 usec
109: 2097149 bytes 26 times --> 9305.75 Mbps in 1719.36 usec
110: 2097152 bytes 29 times --> 9314.56 Mbps in 1717.74 usec
111: 2097155 bytes 29 times --> 9278.43 Mbps in 1724.43 usec
112: 3145725 bytes 28 times --> 9065.15 Mbps in 2647.50 usec
113: 3145728 bytes 25 times --> 9095.10 Mbps in 2638.78 usec
114: 3145731 bytes 25 times --> 9073.88 Mbps in 2644.96 usec
115: 4194301 bytes 12 times --> 8772.63 Mbps in 3647.70 usec
116: 4194304 bytes 13 times --> 8768.32 Mbps in 3649.50 usec
117: 4194307 bytes 13 times --> 8771.37 Mbps in 3648.24 usec
118: 6291453 bytes 13 times --> 8321.22 Mbps in 5768.38 usec
119: 6291456 bytes 11 times --> 8320.00 Mbps in 5769.23 usec
120: 6291459 bytes 11 times --> 8335.25 Mbps in 5758.68 usec
121: 8388605 bytes 5 times --> 8167.02 Mbps in 7836.39 usec
122: 8388608 bytes 6 times --> 8165.44 Mbps in 7837.91 usec
123: 8388611 bytes 6 times --> 8162.24 Mbps in 7840.99 usec
[6:32] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % cd ../osu/
[6:32] svbu-mpi052:~/svn/ompi-tests/osu % mpirun --mca
mpi_paffinity_alone 1 -np 2 --mca btl sm,self osu_latency
# OSU MPI Latency Test (Version 2.1)
# Size Latency (us)
0 0.65
1 0.69
2 0.69
4 0.76
8 0.76
16 0.76
32 0.85
64 0.83
128 1.03
256 1.25
512 1.73
1024 2.47
2048 4.18
4096 6.53
8192 11.23
16384 18.91
32768 34.97
65536 60.80
131072 112.09
262144 215.15
524288 427.97
1048576 880.90
2097152 1840.40
4194304 3945.23
[6:33] svbu-mpi052:~/svn/ompi-tests/osu %

On Jul 23, 2008, at 7:24 AM, Lenny Verkhovsky wrote:

> Sorry Terry, :).
>
> ---------- Forwarded message ----------
> From: Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]>
> Date: Jul 23, 2008 2:22 PM
> Subject: Re: [OMPI devel] [OMPI bugs] [Open MPI] #1250: Performance
> problem on SM
> To: Lenny Berkhovsky <lenny.verkhovsky_at_[hidden]>
>
>
>
> On 7/23/08, Terry Dontje <Terry.Dontje_at_[hidden]> wrote: I didn't see
> any attached results on the email.
>
> --td
> Lenny Verkhovsky wrote:
>
> I rechecked in on the same node, still no degradation,
>
> see results attached.
>
>
> On 7/22/08, *Open MPI* <bugs_at_[hidden] <mailto:bugs_at_open-
> mpi.org>> wrote:
>
> #1250: Performance problem on SM
> --------------------
> +-------------------------------------------------------
> Reporter: bosilca | Owner: bosilca
> Type: defect | Status: assigned
> Priority: blocker | Milestone: Open MPI 1.3
> Version: | Resolution:
> Keywords: |
> --------------------
> +-------------------------------------------------------
>
>
> Comment(by tdd):
>
> Hmmm, Lennyve isn't your mpirun above going across nodes and not
> on the
> same node? I am running netpipe on a single node.
>
>
> --
> Ticket URL:
> <https://svn.open-mpi.org/trac/ompi/ticket/1250#comment:20>
>
> Open MPI <http://www.open-mpi.org/>
>
>
> _______________________________________________
> bugs mailing list
> bugs_at_[hidden] <mailto:bugs_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/bugs
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
>
> <NPmpi.log>_______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems