Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?
From: Mike Dubman (mike.ompi_at_[hidden])
Date: 2008-10-22 02:53:13


using 2 HCAs on the same PCI-Exp bus (as well as 2 ports from the same HCA)
will not improve performance, PCI-Exp is the bottleneck.

On Mon, Oct 20, 2008 at 2:28 AM, Mostyn Lewis <Mostyn.Lewis_at_[hidden]> wrote:

> Well, here's what I see with the IMB PingPong test using two ConnectX DDR
> cards
> in each of 2 machines. I'm just quoting the last line at 10 repetitions of
> the 4194304 bytes.
>
> Scali_MPI_Connect-5.6.4-59151: (multi rail setup in /etc/dat.conf)
> #bytes #repetitions t[usec] Mbytes/sec
> 4194304 10 2198.24 1819.63
> mvapich2-1.2rc2: (MV2_NUM_HCAS=2 MV2_NUM_PORTS=1)
> #bytes #repetitions t[usec] Mbytes/sec
> 4194304 10 2427.24 1647.96
> OpenMPI SVN 19772:
> #bytes #repetitions t[usec] Mbytes/sec
> 4194304 10 3676.35 1088.03
>
> Repeatable within bounds.
>
> This is OFED-1.3.1 and I peered at
> /sys/class/infiniband/mlx4_0/ports/1/counters/port_rcv_packets
> and
> /sys/class/infiniband/mlx4_1/ports/1/counters/port_rcv_packets
> on one of the 2 machines and looked at what happened for Scali
> and OpenMPI.
>
> Scali packets:
> HCA 0 port1 = 115116625 - 114903198 = 213427
> HCA 1 port1 = 78099566 - 77886143 = 213423
> --------------------------------------------
> 426850
> OpenMPI packets:
> HCA 0 port1 = 115233624 - 115116625 = 116999
> HCA 1 port1 = 78216425 - 78099566 = 116859
> --------------------------------------------
> 233858
>
> Scali is set up so that data larger than 8192 bytes is striped
> across the 2 HCAs using 8192 bytes per HCA in a round robin fashion.
>
> So, it seems that OpenMPI is using both ports but strangley ends
> up with a Mbytes/sec rate which is worse than a single HCA only.
> If I use a --mca btl_openib_if_exclude mlx41:1, we get
> #bytes #repetitions t[usec] Mbytes/sec
> 4194304 10 3080.59 1298.45
>
> So, what's taking so long? Is this a threading question?
>
> DM
>
>
> On Sun, 19 Oct 2008, Jeff Squyres wrote:
>
> On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote:
>>
>> Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine
>>> to approach double the bandwidth on simple tests such as IMB PingPong?
>>>
>>
>>
>> Yes. OMPI will automatically (and aggressively) use as many active ports
>> as you have. So you shouldn't need to list devices+ports -- OMPI will
>> simply use all ports that it finds in the active state. If your ports are
>> on physically separate IB networks, then each IB network will require a
>> different subnet ID so that OMPI can compute reachability properly.
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>