Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Mixed Mellanox and Qlogic problems
From: David Warren (warren_at_[hidden])
Date: 2011-07-27 19:11:23


Ok, I finally was able to get on and run some ofed tests - it looks to
me like I must have something configured wrong with the qlogic cards,
but I have no idea what???

Mellanox to Qlogic:
  ibv_rc_pingpong n15
   local address: LID 0x0006, QPN 0x240049, PSN 0x87f83a, GID ::
   remote address: LID 0x000d, QPN 0x00b7cb, PSN 0xcc9dee, GID ::
8192000 bytes in 0.01 seconds = 4565.38 Mbit/sec
1000 iters in 0.01 seconds = 14.35 usec/iter

ibv_srq_pingpong n15
   local address: LID 0x0006, QPN 0x280049, PSN 0xf83e06, GID ::
  ...
8192000 bytes in 0.01 seconds = 9829.91 Mbit/sec
1000 iters in 0.01 seconds = 6.67 usec/iter

ibv_uc_pingpong n15
   local address: LID 0x0006, QPN 0x680049, PSN 0x7b33d2, GID ::
   remote address: LID 0x000d, QPN 0x00b7ed, PSN 0x7fafaa, GID ::
8192000 bytes in 0.02 seconds = 4080.19 Mbit/sec
1000 iters in 0.02 seconds = 16.06 usec/iter

Qlogic to Qlogic

ibv_rc_pingpong n15
   local address: LID 0x000b, QPN 0x00afb7, PSN 0x3f08df, GID ::
   remote address: LID 0x000d, QPN 0x00b7ef, PSN 0xd15096, GID ::
8192000 bytes in 0.02 seconds = 3223.13 Mbit/sec
1000 iters in 0.02 seconds = 20.33 usec/iter

ibv_srq_pingpong n15
   local address: LID 0x000b, QPN 0x00afb9, PSN 0x9cdde3, GID ::
  ...
8192000 bytes in 0.01 seconds = 9018.30 Mbit/sec
1000 iters in 0.01 seconds = 7.27 usec/iter

ibv_uc_pingpong n15
   local address: LID 0x000b, QPN 0x00afd9, PSN 0x98cfa0, GID ::
   remote address: LID 0x000d, QPN 0x00b811, PSN 0x0a0d6e, GID ::
8192000 bytes in 0.02 seconds = 3318.28 Mbit/sec
1000 iters in 0.02 seconds = 19.75 usec/iter

Mellanox to Mellanox

ibv_rc_pingpong n5
   local address: LID 0x0009, QPN 0x240049, PSN 0xd72119, GID ::
   remote address: LID 0x0006, QPN 0x6c0049, PSN 0xc1909e, GID ::
8192000 bytes in 0.01 seconds = 7121.93 Mbit/sec
1000 iters in 0.01 seconds = 9.20 usec/iter

ibv_srq_pingpong n5
   local address: LID 0x0009, QPN 0x280049, PSN 0x78f4f7, GID ::
...
8192000 bytes in 0.00 seconds = 24619.08 Mbit/sec
1000 iters in 0.00 seconds = 2.66 usec/iter

ibv_uc_pingpong n5
   local address: LID 0x0009, QPN 0x680049, PSN 0x4002ea, GID ::
   remote address: LID 0x0006, QPN 0x300049, PSN 0x29abf0, GID ::
8192000 bytes in 0.01 seconds = 7176.52 Mbit/sec
1000 iters in 0.01 seconds = 9.13 usec/iter

On 07/17/11 05:49, Jeff Squyres wrote:
> Interesting.
>
> Try with the native OFED benchmarks -- i.e., get MPI out of the way and see if the raw/native performance of the network between the devices reflects the same dichotomy.
>
> (e.g., ibv_rc_pingpong)
>
>
> On Jul 15, 2011, at 7:58 PM, David Warren wrote:
>
>
>> All OFED 1.4 and 2.6.32 (that's what I can get to today)
>> qib to qib:
>>
>> # OSU MPI Latency Test v3.3
>> # Size Latency (us)
>> 0 0.29
>> 1 0.32
>> 2 0.31
>> 4 0.32
>> 8 0.32
>> 16 0.35
>> 32 0.35
>> 64 0.47
>> 128 0.47
>> 256 0.50
>> 512 0.53
>> 1024 0.66
>> 2048 0.88
>> 4096 1.24
>> 8192 1.89
>> 16384 3.94
>> 32768 5.94
>> 65536 9.79
>> 131072 18.93
>> 262144 37.36
>> 524288 71.90
>> 1048576 189.62
>> 2097152 478.55
>> 4194304 1148.80
>>
>> # OSU MPI Bandwidth Test v3.3
>> # Size Bandwidth (MB/s)
>> 1 2.48
>> 2 5.00
>> 4 10.04
>> 8 20.02
>> 16 33.22
>> 32 67.32
>> 64 134.65
>> 128 260.30
>> 256 486.44
>> 512 860.77
>> 1024 1385.54
>> 2048 1940.68
>> 4096 2231.20
>> 8192 2343.30
>> 16384 2944.99
>> 32768 3213.77
>> 65536 3174.85
>> 131072 3220.07
>> 262144 3259.48
>> 524288 3277.05
>> 1048576 3283.97
>> 2097152 3288.91
>> 4194304 3291.84
>>
>> # OSU MPI Bi-Directional Bandwidth Test v3.3
>> # Size Bi-Bandwidth (MB/s)
>> 1 3.10
>> 2 6.21
>> 4 13.08
>> 8 26.91
>> 16 41.00
>> 32 78.17
>> 64 161.13
>> 128 312.08
>> 256 588.18
>> 512 968.32
>> 1024 1683.42
>> 2048 2513.86
>> 4096 2948.11
>> 8192 2918.39
>> 16384 3370.28
>> 32768 3543.99
>> 65536 4159.99
>> 131072 4709.73
>> 262144 4733.31
>> 524288 4795.44
>> 1048576 4753.69
>> 2097152 4786.11
>> 4194304 4779.40
>>
>> mlx4 to mlx4:
>> # OSU MPI Latency Test v3.3
>> # Size Latency (us)
>> 0 1.62
>> 1 1.66
>> 2 1.67
>> 4 1.66
>> 8 1.70
>> 16 1.71
>> 32 1.75
>> 64 1.91
>> 128 3.11
>> 256 3.32
>> 512 3.66
>> 1024 4.46
>> 2048 5.57
>> 4096 6.62
>> 8192 8.95
>> 16384 11.07
>> 32768 15.94
>> 65536 25.57
>> 131072 44.93
>> 262144 83.58
>> 524288 160.85
>> 1048576 315.47
>> 2097152 624.68
>> 4194304 1247.17
>>
>> # OSU MPI Bandwidth Test v3.3
>> # Size Bandwidth (MB/s)
>> 1 1.80
>> 2 4.21
>> 4 8.79
>> 8 18.14
>> 16 35.79
>> 32 68.58
>> 64 132.72
>> 128 221.89
>> 256 399.62
>> 512 724.13
>> 1024 1267.36
>> 2048 1959.22
>> 4096 2354.26
>> 8192 2519.50
>> 16384 3225.44
>> 32768 3227.86
>> 65536 3350.76
>> 131072 3369.86
>> 262144 3378.76
>> 524288 3384.02
>> 1048576 3386.60
>> 2097152 3387.97
>> 4194304 3388.66
>>
>> # OSU MPI Bi-Directional Bandwidth Test v3.3
>> # Size Bi-Bandwidth (MB/s)
>> 1 1.70
>> 2 3.86
>> 4 10.42
>> 8 20.99
>> 16 41.22
>> 32 79.17
>> 64 151.25
>> 128 277.64
>> 256 495.44
>> 512 843.44
>> 1024 162.53
>> 2048 2427.23
>> 4096 2989.63
>> 8192 3587.58
>> 16384 5391.08
>> 32768 6051.56
>> 65536 6314.33
>> 131072 6439.04
>> 262144 6506.51
>> 524288 6539.51
>> 1048576 6558.34
>> 2097152 6567.24
>> 4194304 6555.76
>>
>> mixed:
>> # OSU MPI Latency Test v3.3
>> # Size Latency (us)
>> 0 3.81
>> 1 3.88
>> 2 3.86
>> 4 3.85
>> 8 3.92
>> 16 3.93
>> 32 3.93
>> 64 4.02
>> 128 4.60
>> 256 4.80
>> 512 5.14
>> 1024 5.94
>> 2048 7.26
>> 4096 8.50
>> 8192 10.98
>> 16384 19.92
>> 32768 26.35
>> 65536 39.93
>> 131072 64.45
>> 262144 106.93
>> 524288 191.89
>> 1048576 358.31
>> 2097152 694.25
>> 4194304 1429.56
>>
>> # OSU MPI Bandwidth Test v3.3
>> # Size Bandwidth (MB/s)
>> 1 0.64
>> 2 1.39
>> 4 2.76
>> 8 5.58
>> 16 11.03
>> 32 22.17
>> 64 43.70
>> 128 100.49
>> 256 179.83
>> 512 305.87
>> 1024 544.68
>> 2048 838.22
>> 4096 1187.74
>> 8192 1542.07
>> 16384 1260.93
>> 32768 1708.54
>> 65536 2180.45
>> 131072 2482.28
>> 262144 2624.89
>> 524288 2680.55
>> 1048576 2728.58
>> never gets past here
>>
>> # OSU MPI Bi-Directional Bandwidth Test v3.3
>> # Size Bi-Bandwidth (MB/s)
>> 1 0.41
>> 2 0.83
>> 4 1.68
>> 8 3.37
>> 16 6.71
>> 32 13.37
>> 64 26.64
>> 128 63.47
>> 256 113.23
>> 512 202.92
>> 1024 362.48
>> 2048 578.53
>> 4096 830.31
>> 8192 1143.16
>> 16384 1303.02
>> 32768 1913.07
>> 65536 2463.83
>> 131072 2793.83
>> 262144 2918.32
>> 524288 2987.92
>> 1048576 3033.31
>> never gets past here
>>
>>
>>
>> On 07/15/11 09:03, Jeff Squyres wrote:
>>
>>> I don't think too many people have done combined QLogic + Mellanox runs, so this probably isn't a well-explored space.
>>>
>>> Can you run some microbenchmarks to see what kind of latency / bandwidth you're getting between nodes of the same type and nodes of different types?
>>>
>>> On Jul 14, 2011, at 8:21 PM, David Warren wrote:
>>>
>>>
>>>
>>>> On my test runs (wrf run just long enough to go beyond the spinup influence)
>>>> On just 6 of the the old mlx4 machines I get about 00:05:30 runtime
>>>> On 3 mlx4 and 3 qib nodes I get avg of 00:06:20
>>>> So the slow down is about 11+%
>>>> When this is a full run 11% becomes a evry long time. This has held for some longer tests as well before I went to ofed 1.6.
>>>>
>>>> On 07/14/11 05:55, Jeff Squyres wrote:
>>>>
>>>>
>>>>> On Jul 13, 2011, at 7:46 PM, David Warren wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> I finally got access to the systems again (the original ones are part of our real time system). I thought I would try one other test I had set up first. I went to OFED 1.6 and it started running with no errors. It must have been an OFED bug. Now I just have the speed problem. Anyone have a way to make the mixture of mlx4 and qlogic work together without slowing down?
>>>>>>
>>>>>>
>>>>>>
>>>>> What do you mean by "slowing down"?
>>>>>
>>>>>
>>>>>
>>>>>
>>>> <warren.vcf>
>>>>
>>>>
>>>
>>>
>> <warren.vcf>
>>
>
>