Well, here's what I see with the IMB PingPong test using two ConnectX DDR cards
in each of 2 machines. I'm just quoting the last line at 10 repetitions of
the 4194304 bytes.
Scali_MPI_Connect-5.6.4-59151: (multi rail setup in /etc/dat.conf)
#bytes #repetitions t[usec] Mbytes/sec
4194304 10 2198.24 1819.63
mvapich2-1.2rc2: (MV2_NUM_HCAS=2 MV2_NUM_PORTS=1)
#bytes #repetitions t[usec] Mbytes/sec
4194304 10 2427.24 1647.96
OpenMPI SVN 19772:
#bytes #repetitions t[usec] Mbytes/sec
4194304 10 3676.35 1088.03
Repeatable within bounds.
This is OFED-1.3.1 and I peered at
/sys/class/infiniband/mlx4_0/ports/1/counters/port_rcv_packets
and
/sys/class/infiniband/mlx4_1/ports/1/counters/port_rcv_packets
on one of the 2 machines and looked at what happened for Scali
and OpenMPI.
Scali packets:
HCA 0 port1 = 115116625 - 114903198 = 213427
HCA 1 port1 = 78099566 - 77886143 = 213423
--------------------------------------------
426850
OpenMPI packets:
HCA 0 port1 = 115233624 - 115116625 = 116999
HCA 1 port1 = 78216425 - 78099566 = 116859
--------------------------------------------
233858
Scali is set up so that data larger than 8192 bytes is striped
across the 2 HCAs using 8192 bytes per HCA in a round robin fashion.
So, it seems that OpenMPI is using both ports but strangley ends
up with a Mbytes/sec rate which is worse than a single HCA only.
If I use a --mca btl_openib_if_exclude mlx41:1, we get
#bytes #repetitions t[usec] Mbytes/sec
4194304 10 3080.59 1298.45
So, what's taking so long? Is this a threading question?
DM
On Sun, 19 Oct 2008, Jeff Squyres wrote:
> On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote:
>
>> Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine
>> to approach double the bandwidth on simple tests such as IMB PingPong?
>
>
> Yes. OMPI will automatically (and aggressively) use as many active ports as
> you have. So you shouldn't need to list devices+ports -- OMPI will simply
> use all ports that it finds in the active state. If your ports are on
> physically separate IB networks, then each IB network will require a
> different subnet ID so that OMPI can compute reachability properly.
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
|