Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Scott Atchley (atchley_at_[hidden])
Date: 2007-01-18 08:41:00


On Jan 18, 2007, at 8:11 AM, Peter Kjellstrom wrote:

>> with Lustre, which is about 55% of the
>> theoretical 20 Gb/s advertised speed.
>
> I think this should be calculated against 16 Gbps, not 20 Gbps.

What is the advertised speed of a IB DDR card?

http://mellanox.com/products/hca_cards.php
http://www.voltaire.com/Products/Server_Products/Voltaire_HCA_4X0

>> The ~900 MB/s (7.2 Gb/s)
>> mentioned above is, of course, ~72% of advertised speed. If any IB
>> folks have any better numbers, please correct me.
>
> Using MPI (over a non idle multi-level switch) I get 940 * 10^6
> Bytes/s which
> is 94% of peak for that IB 4x SDR.

7.5 Gb/s. That card is sold as a 10 Gb/s card. See links above.

>> The data throughput limit for 8x PCIe is ~12 Gb/s. The theoretical
>> limit is 16 Gb/s, but each PCIe packet has a whopping 20 byte
>> overhead. If the adapter uses 64 byte packets, then you see 1/3 of
>> the throughput go to overhead.
>
> AFAIK the datafield of a pci-express packet is 0-4096 bytes and the
> header a
> bit more than 20 bytes (including things such as start/stop frame
> bytes,
> LCRC/ECRC..). This gives a maximum speed over 4x PCIe of 993.3 10^6
> Bytes/s
> (8 Gbps after coding minus header waste for a full 4096 byte payload).
>
> In short, the SDR IB equipment I have seen has easily reached 90%+
> while
> PCI-express on the platforms I've tried has been limited to ~75%.
> Current IB
> DDR HCAs are probably limited by (at least) PCI-express 8x.
>
> /Peter

Not all motherboard/chipsets can do more than 64 bytes. Some can,
some cannot. Realistically, most PCIe 4x card are limited to less
than 950 MB/s (7.6 Gb/s).

You keep lowering the bar for the users. :-) The consumer buys X and
expects to get close to X. They are surprised when you tell them that
the "real" rate is Y where Y is 20-40% less than X.

The problem is that cards have two "ends", the host side and the
network side. Focusing on one side while ignoring the other is asking
for confused/upset customers. Mismatching the fabric and the host
connection such as using DDR fabric to 8x PCIe slot limits the
traffic to the slower of the two.

Scott