Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?
From: Mostyn Lewis (Mostyn.Lewis_at_[hidden])
Date: 2008-10-19 20:28:17


Well, here's what I see with the IMB PingPong test using two ConnectX DDR cards
in each of 2 machines. I'm just quoting the last line at 10 repetitions of
the 4194304 bytes.

Scali_MPI_Connect-5.6.4-59151: (multi rail setup in /etc/dat.conf)
        #bytes #repetitions t[usec] Mbytes/sec
       4194304 10 2198.24 1819.63
mvapich2-1.2rc2: (MV2_NUM_HCAS=2 MV2_NUM_PORTS=1)
        #bytes #repetitions t[usec] Mbytes/sec
       4194304 10 2427.24 1647.96
OpenMPI SVN 19772:
        #bytes #repetitions t[usec] Mbytes/sec
       4194304 10 3676.35 1088.03

Repeatable within bounds.

This is OFED-1.3.1 and I peered at
/sys/class/infiniband/mlx4_0/ports/1/counters/port_rcv_packets
and
/sys/class/infiniband/mlx4_1/ports/1/counters/port_rcv_packets
on one of the 2 machines and looked at what happened for Scali
and OpenMPI.

Scali packets:
HCA 0 port1 = 115116625 - 114903198 = 213427
HCA 1 port1 = 78099566 - 77886143 = 213423
--------------------------------------------
                                       426850
OpenMPI packets:
HCA 0 port1 = 115233624 - 115116625 = 116999
HCA 1 port1 = 78216425 - 78099566 = 116859
--------------------------------------------
                                       233858

Scali is set up so that data larger than 8192 bytes is striped
across the 2 HCAs using 8192 bytes per HCA in a round robin fashion.

So, it seems that OpenMPI is using both ports but strangley ends
up with a Mbytes/sec rate which is worse than a single HCA only.
If I use a --mca btl_openib_if_exclude mlx41:1, we get
        #bytes #repetitions t[usec] Mbytes/sec
       4194304 10 3080.59 1298.45

So, what's taking so long? Is this a threading question?

DM

On Sun, 19 Oct 2008, Jeff Squyres wrote:

> On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote:
>
>> Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine
>> to approach double the bandwidth on simple tests such as IMB PingPong?
>
>
> Yes. OMPI will automatically (and aggressively) use as many active ports as
> you have. So you shouldn't need to list devices+ports -- OMPI will simply
> use all ports that it finds in the active state. If your ports are on
> physically separate IB networks, then each IB network will require a
> different subnet ID so that OMPI can compute reachability properly.
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users