Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Fwd: [OMPI bugs] [Open MPI] #1250: Performance problem on SM
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2008-07-23 10:37:13


This seems to work for me too. What is interesting is my experiments
have shown that if you run on RH5.1 you don't need to set
mpi_yield_when_idle to 0.

--td

Jeff Squyres wrote:
> Doh! I guess we still don't have that calculating right yet; I
> thought we had fixed that...
>
> [7:12] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % mpirun --mca
> mpi_paffinity_alone 1 -np 2 --mca btl sm,self --mca
> mpi_yield_when_idle 0 NPmpi
> 0: svbu-mpi052
> 1: svbu-mpi052
> Now starting the main loop
> 0: 1 bytes 131689 times --> 11.22 Mbps in 0.68 usec
> 1: 2 bytes 147026 times --> 22.54 Mbps in 0.68 usec
> 2: 3 bytes 147741 times --> 33.65 Mbps in 0.68 usec
> ...
>
> [7:12] svbu-mpi052:~/svn/ompi-tests/osu % mpirun --mca
> mpi_paffinity_alone 1 -np 2 --mca btl sm,self --mca
> mpi_yield_when_idle 0 osu_latency
> # OSU MPI Latency Test (Version 2.1)
> # Size Latency (us)
> 0 0.64
> 1 0.67
> 2 0.67
> 4 0.74
> ...
>
> I'll check with Ralph.
>
>
>
> On Jul 23, 2008, at 10:01 AM, George Bosilca wrote:
>
>> Can you try the HEAD with the mpi_yield_when_idle set to 0 please.
>>
>> Thanks,
>> george.
>>
>>
>> On Jul 23, 2008, at 3:39 PM, Jeff Squyres wrote:
>>
>>> Short version: I'm seeing a large performance drop between r18850
>>> and the SVN HEAD.
>>>
>>> Longer version:
>>>
>>> FWIW, I ran the tests on 3 versions on a woodcrest-class x86_64
>>> machine running RHEL4U4:
>>>
>>> * Trunk HEAD (r18997)
>>> * r18973 --> had to patch the cpu64* thingy in openib btl to get it
>>> to compile
>>> * r18850
>>>
>>> I ran both osu_latency and NetPIPE 3.7.1. In the r18997 and r18973,
>>> the latency for short sends over sm is *significantly* higher than
>>> that of r18850. Detailed results below.
>>>
>>> ================================================================
>>> r18997
>>>
>>> [6:27] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % mpirun --mca
>>> mpi_paffinity_alone 1 -np 2 --mca btl sm,self NPmpi
>>> 0: svbu-mpi052
>>> 1: svbu-mpi052
>>> Now starting the main loop
>>> 0: 1 bytes 85423 times --> 8.23 Mbps in 0.93 usec
>>> 1: 2 bytes 107852 times --> 16.46 Mbps in 0.93 usec
>>> 2: 3 bytes 107874 times --> 24.65 Mbps in 0.93 usec
>>> 3: 4 bytes 71801 times --> 30.36 Mbps in 1.01 usec
>>> 4: 6 bytes 74610 times --> 45.27 Mbps in 1.01 usec
>>> 5: 8 bytes 49448 times --> 60.59 Mbps in 1.01 usec
>>> 6: 12 bytes 62044 times --> 90.72 Mbps in 1.01 usec
>>> 7: 13 bytes 41287 times --> 98.58 Mbps in 1.01 usec
>>> 8: 16 bytes 45872 times --> 120.81 Mbps in 1.01 usec
>>> 9: 19 bytes 55670 times --> 143.78 Mbps in 1.01 usec
>>> 10: 21 bytes 62644 times --> 156.63 Mbps in 1.02 usec
>>> 11: 24 bytes 65172 times --> 177.63 Mbps in 1.03 usec
>>> 12: 27 bytes 68714 times --> 187.21 Mbps in 1.10 usec
>>> 13: 29 bytes 40392 times --> 201.05 Mbps in 1.10 usec
>>> 14: 32 bytes 43868 times --> 220.92 Mbps in 1.11 usec
>>> 15: 35 bytes 48072 times --> 255.73 Mbps in 1.04 usec
>>> 16: 45 bytes 54725 times --> 308.90 Mbps in 1.11 usec
>>> 17: 48 bytes 59983 times --> 329.04 Mbps in 1.11 usec
>>> 18: 51 bytes 61772 times --> 348.53 Mbps in 1.12 usec
>>> 19: 61 bytes 35126 times --> 408.86 Mbps in 1.14 usec
>>> 20: 64 bytes 43206 times --> 453.67 Mbps in 1.08 usec
>>> 21: 67 bytes 47907 times --> 487.77 Mbps in 1.05 usec
>>> 22: 93 bytes 51271 times --> 561.32 Mbps in 1.26 usec
>>> 23: 96 bytes 52741 times --> 595.08 Mbps in 1.23 usec
>>> 24: 99 bytes 55012 times --> 617.64 Mbps in 1.22 usec
>>> 25: 125 bytes 29735 times --> 736.44 Mbps in 1.29 usec
>>> 26: 128 bytes 38301 times --> 779.33 Mbps in 1.25 usec
>>> 27: 131 bytes 40525 times --> 818.32 Mbps in 1.22 usec
>>> 28: 189 bytes 42501 times --> 1007.67 Mbps in 1.43 usec
>>> 29: 192 bytes 46588 times --> 1084.13 Mbps in 1.35 usec
>>> 30: 195 bytes 49725 times --> 1128.97 Mbps in 1.32 usec
>>> 31: 253 bytes 26462 times --> 1257.97 Mbps in 1.53 usec
>>> 32: 256 bytes 32457 times --> 1304.17 Mbps in 1.50 usec
>>> 33: 259 bytes 33647 times --> 1354.14 Mbps in 1.46 usec
>>> 34: 381 bytes 34925 times --> 1616.43 Mbps in 1.80 usec
>>> 35: 384 bytes 37072 times --> 1676.92 Mbps in 1.75 usec
>>> 36: 387 bytes 38308 times --> 1724.50 Mbps in 1.71 usec
>>> 37: 509 bytes 19921 times --> 1908.30 Mbps in 2.03 usec
>>> 38: 512 bytes 24521 times --> 2013.16 Mbps in 1.94 usec
>>> 39: 515 bytes 25869 times --> 2038.18 Mbps in 1.93 usec
>>> 40: 765 bytes 26188 times --> 2474.81 Mbps in 2.36 usec
>>> 41: 768 bytes 28268 times --> 2513.00 Mbps in 2.33 usec
>>> 42: 771 bytes 28648 times --> 2531.45 Mbps in 2.32 usec
>>> 43: 1021 bytes 14512 times --> 2831.70 Mbps in 2.75 usec
>>> 44: 1024 bytes 18158 times --> 2853.94 Mbps in 2.74 usec
>>> 45: 1027 bytes 18300 times --> 2872.58 Mbps in 2.73 usec
>>> 46: 1533 bytes 18420 times --> 3298.65 Mbps in 3.55 usec
>>> 47: 1536 bytes 18802 times --> 3320.86 Mbps in 3.53 usec
>>> 48: 1539 bytes 18910 times --> 3351.99 Mbps in 3.50 usec
>>> 49: 2045 bytes 9571 times --> 3599.21 Mbps in 4.33 usec
>>> 50: 2048 bytes 11528 times --> 3640.91 Mbps in 4.29 usec
>>> 51: 2051 bytes 11662 times --> 3638.62 Mbps in 4.30 usec
>>> 52: 3069 bytes 11654 times --> 3905.17 Mbps in 6.00 usec
>>> 53: 3072 bytes 11118 times --> 3917.67 Mbps in 5.98 usec
>>> 54: 3075 bytes 11149 times --> 3973.53 Mbps in 5.90 usec
>>> 55: 4093 bytes 5662 times --> 4450.80 Mbps in 7.02 usec
>>> 56: 4096 bytes 7124 times --> 4445.17 Mbps in 7.03 usec
>>> 57: 4099 bytes 7115 times --> 4412.88 Mbps in 7.09 usec
>>> 58: 6141 bytes 7064 times --> 4962.74 Mbps in 9.44 usec
>>> 59: 6144 bytes 7061 times --> 4941.94 Mbps in 9.49 usec
>>> 60: 6147 bytes 7030 times --> 4938.46 Mbps in 9.50 usec
>>> 61: 8189 bytes 3515 times --> 5263.65 Mbps in 11.87 usec
>>> 62: 8192 bytes 4211 times --> 5249.31 Mbps in 11.91 usec
>>> 63: 8195 bytes 4200 times --> 5202.08 Mbps in 12.02 usec
>>> 64: 12285 bytes 4162 times --> 6380.89 Mbps in 14.69 usec
>>> 65: 12288 bytes 4538 times --> 6385.27 Mbps in 14.68 usec
>>> 66: 12291 bytes 4541 times --> 6335.05 Mbps in 14.80 usec
>>> 67: 16381 bytes 2253 times --> 6535.76 Mbps in 19.12 usec
>>> 68: 16384 bytes 2614 times --> 6537.24 Mbps in 19.12 usec
>>> 69: 16387 bytes 2615 times --> 6514.52 Mbps in 19.19 usec
>>> 70: 24573 bytes 2606 times --> 6870.51 Mbps in 27.29 usec
>>> 71: 24576 bytes 2443 times --> 6866.57 Mbps in 27.31 usec
>>> 72: 24579 bytes 2441 times --> 6864.32 Mbps in 27.32 usec
>>> 73: 32765 bytes 1220 times --> 7124.85 Mbps in 35.09 usec
>>> 74: 32768 bytes 1425 times --> 7120.30 Mbps in 35.11 usec
>>> 75: 32771 bytes 1424 times --> 7127.15 Mbps in 35.08 usec
>>> 76: 49149 bytes 1425 times --> 8313.31 Mbps in 45.11 usec
>>> 77: 49152 bytes 1478 times --> 8312.58 Mbps in 45.11 usec
>>> 78: 49155 bytes 1477 times --> 8309.34 Mbps in 45.13 usec
>>> 79: 65533 bytes 738 times --> 8219.82 Mbps in 60.83 usec
>>> 80: 65536 bytes 822 times --> 8209.24 Mbps in 60.91 usec
>>> 81: 65539 bytes 820 times --> 8216.00 Mbps in 60.86 usec
>>> 82: 98301 bytes 821 times --> 8698.24 Mbps in 86.22 usec
>>> 83: 98304 bytes 773 times --> 8695.03 Mbps in 86.26 usec
>>> 84: 98307 bytes 772 times --> 8696.95 Mbps in 86.24 usec
>>> 85: 131069 bytes 386 times --> 8916.50 Mbps in 112.15 usec
>>> 86: 131072 bytes 445 times --> 8917.29 Mbps in 112.14 usec
>>> 87: 131075 bytes 445 times --> 8916.62 Mbps in 112.15 usec
>>> 88: 196605 bytes 445 times --> 9205.17 Mbps in 162.95 usec
>>> 89: 196608 bytes 409 times --> 9195.75 Mbps in 163.12 usec
>>> 90: 196611 bytes 408 times --> 9203.02 Mbps in 162.99 usec
>>> 91: 262141 bytes 204 times --> 9338.32 Mbps in 214.17 usec
>>> 92: 262144 bytes 233 times --> 9350.57 Mbps in 213.89 usec
>>> 93: 262147 bytes 233 times --> 9336.72 Mbps in 214.21 usec
>>> 94: 393213 bytes 233 times --> 9480.21 Mbps in 316.45 usec
>>> 95: 393216 bytes 210 times --> 9476.10 Mbps in 316.59 usec
>>> 96: 393219 bytes 210 times --> 9471.25 Mbps in 316.75 usec
>>> 97: 524285 bytes 105 times --> 9523.20 Mbps in 420.02 usec
>>> 98: 524288 bytes 119 times --> 9519.53 Mbps in 420.19 usec
>>> 99: 524291 bytes 118 times --> 9523.09 Mbps in 420.03 usec
>>> 100: 786429 bytes 119 times --> 9555.83 Mbps in 627.89 usec
>>> 101: 786432 bytes 106 times --> 9542.67 Mbps in 628.75 usec
>>> 102: 786435 bytes 106 times --> 9554.47 Mbps in 627.98 usec
>>> 103: 1048573 bytes 53 times --> 9527.96 Mbps in 839.63 usec
>>> 104: 1048576 bytes 59 times --> 9530.63 Mbps in 839.40 usec
>>> 105: 1048579 bytes 59 times --> 9500.65 Mbps in 842.05 usec
>>> 106: 1572861 bytes 59 times --> 9389.53 Mbps in 1278.02 usec
>>> 107: 1572864 bytes 52 times --> 9396.87 Mbps in 1277.02 usec
>>> 108: 1572867 bytes 52 times --> 9375.01 Mbps in 1280.00 usec
>>> 109: 2097149 bytes 26 times --> 9271.33 Mbps in 1725.75 usec
>>> 110: 2097152 bytes 28 times --> 9273.64 Mbps in 1725.32 usec
>>> 111: 2097155 bytes 28 times --> 9281.42 Mbps in 1723.88 usec
>>> 112: 3145725 bytes 29 times --> 9109.93 Mbps in 2634.48 usec
>>> 113: 3145728 bytes 25 times --> 9128.80 Mbps in 2629.04 usec
>>> 114: 3145731 bytes 25 times --> 9099.66 Mbps in 2637.46 usec
>>> 115: 4194301 bytes 12 times --> 8840.19 Mbps in 3619.83 usec
>>> 116: 4194304 bytes 13 times --> 8847.10 Mbps in 3617.00 usec
>>> 117: 4194307 bytes 13 times --> 8827.22 Mbps in 3625.15 usec
>>> 118: 6291453 bytes 13 times --> 8351.40 Mbps in 5747.54 usec
>>> 119: 6291456 bytes 11 times --> 8345.46 Mbps in 5751.63 usec
>>> 120: 6291459 bytes 11 times --> 8343.42 Mbps in 5753.04 usec
>>> 121: 8388605 bytes 5 times --> 8166.28 Mbps in 7837.10 usec
>>> 122: 8388608 bytes 6 times --> 8166.91 Mbps in 7836.50 usec
>>> 123: 8388611 bytes 6 times --> 8162.67 Mbps in 7840.57 usec
>>> [6:29] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % cd ../osu/
>>> [6:29] svbu-mpi052:~/svn/ompi-tests/osu % mpirun --mca
>>> mpi_paffinity_alone 1 -np 2 --mca btl sm,self osu_latency
>>> # OSU MPI Latency Test (Version 2.1)
>>> # Size Latency (us)
>>> 0 0.85
>>> 1 0.91
>>> 2 0.91
>>> 4 0.99
>>> 8 0.99
>>> 16 0.99
>>> 32 1.08
>>> 64 1.08
>>> 128 1.25
>>> 256 1.49
>>> 512 1.92
>>> 1024 2.71
>>> 2048 4.40
>>> 4096 6.85
>>> 8192 11.48
>>> 16384 19.25
>>> 32768 35.25
>>> 65536 61.03
>>> 131072 113.15
>>> 262144 215.54
>>> 524288 428.19
>>> 1048576 880.72
>>> 2097152 1839.12
>>> 4194304 3934.90
>>> [6:29] svbu-mpi052:~/svn/ompi-tests/osu %
>>>
>>> ================================================================
>>> r18973
>>>
>>> [6:36] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % mpirun --mca
>>> mpi_paffinity_alone 1 -np 2 --mca btl sm,self NPmpi
>>> 1: svbu-mpi052
>>> 0: svbu-mpi052
>>> Now starting the main loop
>>> 0: 1 bytes 84392 times --> 8.29 Mbps in 0.92 usec
>>> 1: 2 bytes 108626 times --> 16.58 Mbps in 0.92 usec
>>> 2: 3 bytes 108657 times --> 24.91 Mbps in 0.92 usec
>>> 3: 4 bytes 72561 times --> 30.33 Mbps in 1.01 usec
>>> 4: 6 bytes 74529 times --> 45.51 Mbps in 1.01 usec
>>> 5: 8 bytes 49709 times --> 60.76 Mbps in 1.00 usec
>>> 6: 12 bytes 62222 times --> 90.84 Mbps in 1.01 usec
>>> 7: 13 bytes 41344 times --> 98.58 Mbps in 1.01 usec
>>> 8: 16 bytes 45875 times --> 121.19 Mbps in 1.01 usec
>>> 9: 19 bytes 55845 times --> 143.43 Mbps in 1.01 usec
>>> 10: 21 bytes 62491 times --> 156.66 Mbps in 1.02 usec
>>> 11: 24 bytes 65185 times --> 177.87 Mbps in 1.03 usec
>>> 12: 27 bytes 68806 times --> 187.63 Mbps in 1.10 usec
>>> 13: 29 bytes 40482 times --> 202.10 Mbps in 1.09 usec
>>> 14: 32 bytes 44096 times --> 222.11 Mbps in 1.10 usec
>>> 15: 35 bytes 48331 times --> 255.12 Mbps in 1.05 usec
>>> 16: 45 bytes 54593 times --> 308.42 Mbps in 1.11 usec
>>> 17: 48 bytes 59888 times --> 330.10 Mbps in 1.11 usec
>>> 18: 51 bytes 61970 times --> 348.31 Mbps in 1.12 usec
>>> 19: 61 bytes 35104 times --> 409.39 Mbps in 1.14 usec
>>> 20: 64 bytes 43261 times --> 451.69 Mbps in 1.08 usec
>>> 21: 67 bytes 47698 times --> 489.98 Mbps in 1.04 usec
>>> 22: 93 bytes 51504 times --> 565.69 Mbps in 1.25 usec
>>> 23: 96 bytes 53150 times --> 598.55 Mbps in 1.22 usec
>>> 24: 99 bytes 55333 times --> 623.24 Mbps in 1.21 usec
>>> 25: 125 bytes 30005 times --> 735.91 Mbps in 1.30 usec
>>> 26: 128 bytes 38274 times --> 781.32 Mbps in 1.25 usec
>>> 27: 131 bytes 40628 times --> 828.90 Mbps in 1.21 usec
>>> 28: 189 bytes 43050 times --> 1018.02 Mbps in 1.42 usec
>>> 29: 192 bytes 47066 times --> 1069.01 Mbps in 1.37 usec
>>> 30: 195 bytes 49032 times --> 1122.18 Mbps in 1.33 usec
>>> 31: 253 bytes 26303 times --> 1259.95 Mbps in 1.53 usec
>>> 32: 256 bytes 32508 times --> 1307.53 Mbps in 1.49 usec
>>> 33: 259 bytes 33734 times --> 1357.47 Mbps in 1.46 usec
>>> 34: 381 bytes 35011 times --> 1617.08 Mbps in 1.80 usec
>>> 35: 384 bytes 37087 times --> 1675.72 Mbps in 1.75 usec
>>> 36: 387 bytes 38280 times --> 1722.27 Mbps in 1.71 usec
>>> 37: 509 bytes 19895 times --> 1913.58 Mbps in 2.03 usec
>>> 38: 512 bytes 24589 times --> 1967.08 Mbps in 1.99 usec
>>> 39: 515 bytes 25276 times --> 2041.10 Mbps in 1.93 usec
>>> 40: 765 bytes 26226 times --> 2448.96 Mbps in 2.38 usec
>>> 41: 768 bytes 27973 times --> 2503.60 Mbps in 2.34 usec
>>> 42: 771 bytes 28541 times --> 2541.12 Mbps in 2.31 usec
>>> 43: 1021 bytes 14567 times --> 2845.46 Mbps in 2.74 usec
>>> 44: 1024 bytes 18246 times --> 2854.45 Mbps in 2.74 usec
>>> 45: 1027 bytes 18304 times --> 2939.64 Mbps in 2.67 usec
>>> 46: 1533 bytes 18850 times --> 3291.70 Mbps in 3.55 usec
>>> 47: 1536 bytes 18762 times --> 3310.45 Mbps in 3.54 usec
>>> 48: 1539 bytes 18851 times --> 3386.68 Mbps in 3.47 usec
>>> 49: 2045 bytes 9670 times --> 3635.22 Mbps in 4.29 usec
>>> 50: 2048 bytes 11644 times --> 3646.70 Mbps in 4.28 usec
>>> 51: 2051 bytes 11680 times --> 3640.09 Mbps in 4.30 usec
>>> 52: 3069 bytes 11659 times --> 3926.68 Mbps in 5.96 usec
>>> 53: 3072 bytes 11180 times --> 3962.33 Mbps in 5.92 usec
>>> 54: 3075 bytes 11276 times --> 3978.54 Mbps in 5.90 usec
>>> 55: 4093 bytes 5669 times --> 4398.66 Mbps in 7.10 usec
>>> 56: 4096 bytes 7041 times --> 4429.95 Mbps in 7.05 usec
>>> 57: 4099 bytes 7091 times --> 4378.99 Mbps in 7.14 usec
>>> 58: 6141 bytes 7009 times --> 5001.17 Mbps in 9.37 usec
>>> 59: 6144 bytes 7116 times --> 4984.01 Mbps in 9.41 usec
>>> 60: 6147 bytes 7090 times --> 5015.48 Mbps in 9.35 usec
>>> 61: 8189 bytes 3570 times --> 5286.90 Mbps in 11.82 usec
>>> 62: 8192 bytes 4230 times --> 5222.58 Mbps in 11.97 usec
>>> 63: 8195 bytes 4179 times --> 5261.91 Mbps in 11.88 usec
>>> 64: 12285 bytes 4210 times --> 6370.90 Mbps in 14.71 usec
>>> 65: 12288 bytes 4531 times --> 6376.57 Mbps in 14.70 usec
>>> 66: 12291 bytes 4535 times --> 6349.10 Mbps in 14.77 usec
>>> 67: 16381 bytes 2258 times --> 6521.57 Mbps in 19.16 usec
>>> 68: 16384 bytes 2608 times --> 6520.25 Mbps in 19.17 usec
>>> 69: 16387 bytes 2608 times --> 6504.81 Mbps in 19.22 usec
>>> 70: 24573 bytes 2602 times --> 6867.93 Mbps in 27.30 usec
>>> 71: 24576 bytes 2442 times --> 6869.27 Mbps in 27.30 usec
>>> 72: 24579 bytes 2442 times --> 6864.04 Mbps in 27.32 usec
>>> 73: 32765 bytes 1220 times --> 7118.03 Mbps in 35.12 usec
>>> 74: 32768 bytes 1423 times --> 7117.77 Mbps in 35.12 usec
>>> 75: 32771 bytes 1423 times --> 7120.85 Mbps in 35.11 usec
>>> 76: 49149 bytes 1424 times --> 8324.26 Mbps in 45.05 usec
>>> 77: 49152 bytes 1479 times --> 8328.77 Mbps in 45.02 usec
>>> 78: 49155 bytes 1480 times --> 8320.47 Mbps in 45.07 usec
>>> 79: 65533 bytes 739 times --> 8214.38 Mbps in 60.87 usec
>>> 80: 65536 bytes 821 times --> 8219.87 Mbps in 60.83 usec
>>> 81: 65539 bytes 822 times --> 8232.40 Mbps in 60.74 usec
>>> 82: 98301 bytes 823 times --> 8717.21 Mbps in 86.03 usec
>>> 83: 98304 bytes 774 times --> 8716.08 Mbps in 86.05 usec
>>> 84: 98307 bytes 774 times --> 8714.26 Mbps in 86.07 usec
>>> 85: 131069 bytes 387 times --> 8921.59 Mbps in 112.09 usec
>>> 86: 131072 bytes 446 times --> 8935.37 Mbps in 111.91 usec
>>> 87: 131075 bytes 446 times --> 8925.47 Mbps in 112.04 usec
>>> 88: 196605 bytes 446 times --> 9195.80 Mbps in 163.12 usec
>>> 89: 196608 bytes 408 times --> 9197.41 Mbps in 163.09 usec
>>> 90: 196611 bytes 408 times --> 9204.33 Mbps in 162.97 usec
>>> 91: 262141 bytes 204 times --> 9344.95 Mbps in 214.02 usec
>>> 92: 262144 bytes 233 times --> 9347.58 Mbps in 213.96 usec
>>> 93: 262147 bytes 233 times --> 9340.56 Mbps in 214.12 usec
>>> 94: 393213 bytes 233 times --> 9473.27 Mbps in 316.68 usec
>>> 95: 393216 bytes 210 times --> 9486.24 Mbps in 316.25 usec
>>> 96: 393219 bytes 210 times --> 9500.26 Mbps in 315.78 usec
>>> 97: 524285 bytes 105 times --> 9538.88 Mbps in 419.33 usec
>>> 98: 524288 bytes 119 times --> 9543.40 Mbps in 419.14 usec
>>> 99: 524291 bytes 119 times --> 9534.73 Mbps in 419.52 usec
>>> 100: 786429 bytes 119 times --> 9574.15 Mbps in 626.69 usec
>>> 101: 786432 bytes 106 times --> 9565.70 Mbps in 627.24 usec
>>> 102: 786435 bytes 106 times --> 9544.50 Mbps in 628.64 usec
>>> 103: 1048573 bytes 53 times --> 9530.85 Mbps in 839.38 usec
>>> 104: 1048576 bytes 59 times --> 9525.24 Mbps in 839.87 usec
>>> 105: 1048579 bytes 59 times --> 9511.86 Mbps in 841.06 usec
>>> 106: 1572861 bytes 59 times --> 9391.40 Mbps in 1277.76 usec
>>> 107: 1572864 bytes 52 times --> 9395.54 Mbps in 1277.20 usec
>>> 108: 1572867 bytes 52 times --> 9386.02 Mbps in 1278.50 usec
>>> 109: 2097149 bytes 26 times --> 9298.48 Mbps in 1720.71 usec
>>> 110: 2097152 bytes 29 times --> 9313.43 Mbps in 1717.95 usec
>>> 111: 2097155 bytes 29 times --> 9293.49 Mbps in 1721.64 usec
>>> 112: 3145725 bytes 29 times --> 9126.67 Mbps in 2629.65 usec
>>> 113: 3145728 bytes 25 times --> 9113.76 Mbps in 2633.38 usec
>>> 114: 3145731 bytes 25 times --> 9079.90 Mbps in 2643.20 usec
>>> 115: 4194301 bytes 12 times --> 8810.57 Mbps in 3632.00 usec
>>> 116: 4194304 bytes 13 times --> 8821.99 Mbps in 3627.30 usec
>>> 117: 4194307 bytes 13 times --> 8801.17 Mbps in 3635.88 usec
>>> 118: 6291453 bytes 13 times --> 8337.50 Mbps in 5757.12 usec
>>> 119: 6291456 bytes 11 times --> 8332.94 Mbps in 5760.27 usec
>>> 120: 6291459 bytes 11 times --> 8346.25 Mbps in 5751.09 usec
>>> 121: 8388605 bytes 5 times --> 8159.20 Mbps in 7843.90 usec
>>> 122: 8388608 bytes 6 times --> 8166.83 Mbps in 7836.58 usec
>>> 123: 8388611 bytes 6 times --> 8161.26 Mbps in 7841.92 usec
>>> [6:37] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % cd ../osu/
>>> [6:37] svbu-mpi052:~/svn/ompi-tests/osu % mpirun --mca
>>> mpi_paffinity_alone 1 -np 2 --mca btl sm,self osu_latency
>>> # OSU MPI Latency Test (Version 2.1)
>>> # Size Latency (us)
>>> 0 0.85
>>> 1 0.91
>>> 2 0.91
>>> 4 0.99
>>> 8 0.99
>>> 16 0.99
>>> 32 1.09
>>> 64 1.07
>>> 128 1.25
>>> 256 1.49
>>> 512 1.97
>>> 1024 2.69
>>> 2048 4.29
>>> 4096 6.83
>>> 8192 11.41
>>> 16384 19.69
>>> 32768 35.27
>>> 65536 61.06
>>> 131072 112.51
>>> 262144 215.47
>>> 524288 429.60
>>> 1048576 882.89
>>> 2097152 1836.45
>>> 4194304 3943.47
>>> [6:37] svbu-mpi052:~/svn/ompi-tests/osu %
>>>
>>> ================================================================
>>> r18850
>>> [6:31] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % mpirun --mca
>>> mpi_paffinity_alone 1 -np 2 --mca btl sm,self NPmpi
>>> 0: svbu-mpi052
>>> 1: svbu-mpi052
>>> Now starting the main loop
>>> 0: 1 bytes 116185 times --> 11.32 Mbps in 0.67 usec
>>> 1: 2 bytes 148348 times --> 22.58 Mbps in 0.68 usec
>>> 2: 3 bytes 147969 times --> 33.88 Mbps in 0.68 usec
>>> 3: 4 bytes 98695 times --> 40.58 Mbps in 0.75 usec
>>> 4: 6 bytes 99737 times --> 60.85 Mbps in 0.75 usec
>>> 5: 8 bytes 66464 times --> 81.13 Mbps in 0.75 usec
>>> 6: 12 bytes 83076 times --> 121.58 Mbps in 0.75 usec
>>> 7: 13 bytes 55334 times --> 131.83 Mbps in 0.75 usec
>>> 8: 16 bytes 61344 times --> 161.81 Mbps in 0.75 usec
>>> 9: 19 bytes 74561 times --> 190.93 Mbps in 0.76 usec
>>> 10: 21 bytes 83186 times --> 207.97 Mbps in 0.77 usec
>>> 11: 24 bytes 86535 times --> 235.30 Mbps in 0.78 usec
>>> 12: 27 bytes 91024 times --> 241.36 Mbps in 0.85 usec
>>> 13: 29 bytes 52074 times --> 260.24 Mbps in 0.85 usec
>>> 14: 32 bytes 56782 times --> 286.57 Mbps in 0.85 usec
>>> 15: 35 bytes 62357 times --> 341.55 Mbps in 0.78 usec
>>> 16: 45 bytes 73090 times --> 400.53 Mbps in 0.86 usec
>>> 17: 48 bytes 77776 times --> 425.94 Mbps in 0.86 usec
>>> 18: 51 bytes 79963 times --> 449.27 Mbps in 0.87 usec
>>> 19: 61 bytes 45280 times --> 520.58 Mbps in 0.89 usec
>>> 20: 64 bytes 55011 times --> 589.77 Mbps in 0.83 usec
>>> 21: 67 bytes 62279 times --> 651.96 Mbps in 0.78 usec
>>> 22: 93 bytes 68530 times --> 706.75 Mbps in 1.00 usec
>>> 23: 96 bytes 66405 times --> 756.56 Mbps in 0.97 usec
>>> 24: 99 bytes 69940 times --> 786.11 Mbps in 0.96 usec
>>> 25: 125 bytes 37846 times --> 917.31 Mbps in 1.04 usec
>>> 26: 128 bytes 47708 times --> 991.21 Mbps in 0.99 usec
>>> 27: 131 bytes 51542 times --> 1030.40 Mbps in 0.97 usec
>>> 28: 189 bytes 53515 times --> 1228.14 Mbps in 1.17 usec
>>> 29: 192 bytes 56781 times --> 1317.94 Mbps in 1.11 usec
>>> 30: 195 bytes 60449 times --> 1372.28 Mbps in 1.08 usec
>>> 31: 253 bytes 32165 times --> 1506.60 Mbps in 1.28 usec
>>> 32: 256 bytes 38871 times --> 1590.08 Mbps in 1.23 usec
>>> 33: 259 bytes 41024 times --> 1657.90 Mbps in 1.19 usec
>>> 34: 381 bytes 42760 times --> 1894.98 Mbps in 1.53 usec
>>> 35: 384 bytes 43460 times --> 1958.92 Mbps in 1.50 usec
>>> 36: 387 bytes 44750 times --> 2029.44 Mbps in 1.45 usec
>>> 37: 509 bytes 23444 times --> 2176.96 Mbps in 1.78 usec
>>> 38: 512 bytes 27974 times --> 2268.97 Mbps in 1.72 usec
>>> 39: 515 bytes 29156 times --> 2340.62 Mbps in 1.68 usec
>>> 40: 765 bytes 30074 times --> 2698.17 Mbps in 2.16 usec
>>> 41: 768 bytes 30819 times --> 2778.48 Mbps in 2.11 usec
>>> 42: 771 bytes 31674 times --> 2847.11 Mbps in 2.07 usec
>>> 43: 1021 bytes 16322 times --> 3039.90 Mbps in 2.56 usec
>>> 44: 1024 bytes 19493 times --> 3161.06 Mbps in 2.47 usec
>>> 45: 1027 bytes 20270 times --> 3221.90 Mbps in 2.43 usec
>>> 46: 1533 bytes 20660 times --> 3455.95 Mbps in 3.38 usec
>>> 47: 1536 bytes 19698 times --> 3580.63 Mbps in 3.27 usec
>>> 48: 1539 bytes 20389 times --> 3623.40 Mbps in 3.24 usec
>>> 49: 2045 bytes 10346 times --> 3751.80 Mbps in 4.16 usec
>>> 50: 2048 bytes 12017 times --> 3833.40 Mbps in 4.08 usec
>>> 51: 2051 bytes 12278 times --> 3813.67 Mbps in 4.10 usec
>>> 52: 3069 bytes 12215 times --> 3997.25 Mbps in 5.86 usec
>>> 53: 3072 bytes 11381 times --> 4058.18 Mbps in 5.78 usec
>>> 54: 3075 bytes 11548 times --> 4102.09 Mbps in 5.72 usec
>>> 55: 4093 bytes 5845 times --> 4726.24 Mbps in 6.61 usec
>>> 56: 4096 bytes 7565 times --> 4679.74 Mbps in 6.68 usec
>>> 57: 4099 bytes 7491 times --> 4649.50 Mbps in 6.73 usec
>>> 58: 6141 bytes 7442 times --> 5072.39 Mbps in 9.24 usec
>>> 59: 6144 bytes 7217 times --> 5064.70 Mbps in 9.26 usec
>>> 60: 6147 bytes 7204 times --> 5067.07 Mbps in 9.26 usec
>>> 61: 8189 bytes 3606 times --> 5387.85 Mbps in 11.60 usec
>>> 62: 8192 bytes 4311 times --> 5393.87 Mbps in 11.59 usec
>>> 63: 8195 bytes 4316 times --> 5301.81 Mbps in 11.79 usec
>>> 64: 12285 bytes 4242 times --> 6568.81 Mbps in 14.27 usec
>>> 65: 12288 bytes 4672 times --> 6561.90 Mbps in 14.29 usec
>>> 66: 12291 bytes 4666 times --> 6548.01 Mbps in 14.32 usec
>>> 67: 16381 bytes 2329 times --> 6662.43 Mbps in 18.76 usec
>>> 68: 16384 bytes 2665 times --> 6655.18 Mbps in 18.78 usec
>>> 69: 16387 bytes 2662 times --> 6634.79 Mbps in 18.84 usec
>>> 70: 24573 bytes 2654 times --> 6937.26 Mbps in 27.02 usec
>>> 71: 24576 bytes 2466 times --> 6937.41 Mbps in 27.03 usec
>>> 72: 24579 bytes 2466 times --> 6931.40 Mbps in 27.05 usec
>>> 73: 32765 bytes 1232 times --> 7218.55 Mbps in 34.63 usec
>>> 74: 32768 bytes 1443 times --> 7213.85 Mbps in 34.66 usec
>>> 75: 32771 bytes 1442 times --> 7218.89 Mbps in 34.63 usec
>>> 76: 49149 bytes 1443 times --> 8387.79 Mbps in 44.71 usec
>>> 77: 49152 bytes 1491 times --> 8385.50 Mbps in 44.72 usec
>>> 78: 49155 bytes 1490 times --> 8390.79 Mbps in 44.69 usec
>>> 79: 65533 bytes 745 times --> 8261.32 Mbps in 60.52 usec
>>> 80: 65536 bytes 826 times --> 8260.34 Mbps in 60.53 usec
>>> 81: 65539 bytes 826 times --> 8265.33 Mbps in 60.50 usec
>>> 82: 98301 bytes 826 times --> 8747.13 Mbps in 85.74 usec
>>> 83: 98304 bytes 777 times --> 8746.72 Mbps in 85.75 usec
>>> 84: 98307 bytes 777 times --> 8733.81 Mbps in 85.88 usec
>>> 85: 131069 bytes 388 times --> 8956.71 Mbps in 111.65 usec
>>> 86: 131072 bytes 447 times --> 8967.16 Mbps in 111.52 usec
>>> 87: 131075 bytes 448 times --> 8960.56 Mbps in 111.60 usec
>>> 88: 196605 bytes 448 times --> 9247.58 Mbps in 162.20 usec
>>> 89: 196608 bytes 411 times --> 9234.30 Mbps in 162.44 usec
>>> 90: 196611 bytes 410 times --> 9231.32 Mbps in 162.49 usec
>>> 91: 262141 bytes 205 times --> 9365.98 Mbps in 213.54 usec
>>> 92: 262144 bytes 234 times --> 9368.25 Mbps in 213.49 usec
>>> 93: 262147 bytes 234 times --> 9363.09 Mbps in 213.61 usec
>>> 94: 393213 bytes 234 times --> 9512.63 Mbps in 315.37 usec
>>> 95: 393216 bytes 211 times --> 9497.01 Mbps in 315.89 usec
>>> 96: 393219 bytes 211 times --> 9510.80 Mbps in 315.43 usec
>>> 97: 524285 bytes 105 times --> 9553.55 Mbps in 418.69 usec
>>> 98: 524288 bytes 119 times --> 9561.59 Mbps in 418.34 usec
>>> 99: 524291 bytes 119 times --> 9551.86 Mbps in 418.77 usec
>>> 100: 786429 bytes 119 times --> 9582.63 Mbps in 626.13 usec
>>> 101: 786432 bytes 106 times --> 9576.72 Mbps in 626.52 usec
>>> 102: 786435 bytes 106 times --> 9584.78 Mbps in 625.99 usec
>>> 103: 1048573 bytes 53 times --> 9545.32 Mbps in 838.10 usec
>>> 104: 1048576 bytes 59 times --> 9532.37 Mbps in 839.25 usec
>>> 105: 1048579 bytes 59 times --> 9542.90 Mbps in 838.32 usec
>>> 106: 1572861 bytes 59 times --> 9434.44 Mbps in 1271.93 usec
>>> 107: 1572864 bytes 52 times --> 9400.64 Mbps in 1276.51 usec
>>> 108: 1572867 bytes 52 times --> 9409.24 Mbps in 1275.34 usec
>>> 109: 2097149 bytes 26 times --> 9305.75 Mbps in 1719.36 usec
>>> 110: 2097152 bytes 29 times --> 9314.56 Mbps in 1717.74 usec
>>> 111: 2097155 bytes 29 times --> 9278.43 Mbps in 1724.43 usec
>>> 112: 3145725 bytes 28 times --> 9065.15 Mbps in 2647.50 usec
>>> 113: 3145728 bytes 25 times --> 9095.10 Mbps in 2638.78 usec
>>> 114: 3145731 bytes 25 times --> 9073.88 Mbps in 2644.96 usec
>>> 115: 4194301 bytes 12 times --> 8772.63 Mbps in 3647.70 usec
>>> 116: 4194304 bytes 13 times --> 8768.32 Mbps in 3649.50 usec
>>> 117: 4194307 bytes 13 times --> 8771.37 Mbps in 3648.24 usec
>>> 118: 6291453 bytes 13 times --> 8321.22 Mbps in 5768.38 usec
>>> 119: 6291456 bytes 11 times --> 8320.00 Mbps in 5769.23 usec
>>> 120: 6291459 bytes 11 times --> 8335.25 Mbps in 5758.68 usec
>>> 121: 8388605 bytes 5 times --> 8167.02 Mbps in 7836.39 usec
>>> 122: 8388608 bytes 6 times --> 8165.44 Mbps in 7837.91 usec
>>> 123: 8388611 bytes 6 times --> 8162.24 Mbps in 7840.99 usec
>>> [6:32] svbu-mpi052:~/svn/ompi-tests/NetPIPE-3.7.1 % cd ../osu/
>>> [6:32] svbu-mpi052:~/svn/ompi-tests/osu % mpirun --mca
>>> mpi_paffinity_alone 1 -np 2 --mca btl sm,self osu_latency
>>> # OSU MPI Latency Test (Version 2.1)
>>> # Size Latency (us)
>>> 0 0.65
>>> 1 0.69
>>> 2 0.69
>>> 4 0.76
>>> 8 0.76
>>> 16 0.76
>>> 32 0.85
>>> 64 0.83
>>> 128 1.03
>>> 256 1.25
>>> 512 1.73
>>> 1024 2.47
>>> 2048 4.18
>>> 4096 6.53
>>> 8192 11.23
>>> 16384 18.91
>>> 32768 34.97
>>> 65536 60.80
>>> 131072 112.09
>>> 262144 215.15
>>> 524288 427.97
>>> 1048576 880.90
>>> 2097152 1840.40
>>> 4194304 3945.23
>>> [6:33] svbu-mpi052:~/svn/ompi-tests/osu %
>>>
>>>
>>>
>>> On Jul 23, 2008, at 7:24 AM, Lenny Verkhovsky wrote:
>>>
>>>> Sorry Terry, :).
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]>
>>>> Date: Jul 23, 2008 2:22 PM
>>>> Subject: Re: [OMPI devel] [OMPI bugs] [Open MPI] #1250: Performance
>>>> problem on SM
>>>> To: Lenny Berkhovsky <lenny.verkhovsky_at_[hidden]>
>>>>
>>>>
>>>>
>>>> On 7/23/08, Terry Dontje <Terry.Dontje_at_[hidden]> wrote: I didn't see
>>>> any attached results on the email.
>>>>
>>>> --td
>>>> Lenny Verkhovsky wrote:
>>>>
>>>> I rechecked in on the same node, still no degradation,
>>>>
>>>> see results attached.
>>>>
>>>>
>>>> On 7/22/08, *Open MPI* <bugs_at_[hidden]
>>>> <mailto:bugs_at_[hidden]>> wrote:
>>>>
>>>> #1250: Performance problem on SM
>>>> --------------------+-------------------------------------------------------
>>>>
>>>> Reporter: bosilca | Owner: bosilca
>>>> Type: defect | Status: assigned
>>>> Priority: blocker | Milestone: Open MPI 1.3
>>>> Version: | Resolution:
>>>> Keywords: |
>>>> --------------------+-------------------------------------------------------
>>>>
>>>>
>>>>
>>>> Comment(by tdd):
>>>>
>>>> Hmmm, Lennyve isn't your mpirun above going across nodes and not
>>>> on the
>>>> same node? I am running netpipe on a single node.
>>>>
>>>>
>>>> --
>>>> Ticket URL:
>>>> <https://svn.open-mpi.org/trac/ompi/ticket/1250#comment:20>
>>>>
>>>> Open MPI <http://www.open-mpi.org/>
>>>>
>>>>
>>>> _______________________________________________
>>>> bugs mailing list
>>>> bugs_at_[hidden] <mailto:bugs_at_[hidden]>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/bugs
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>>
>>>>
>>>> <NPmpi.log>_______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>