Simplifying the code and getting better performance is always a good
approach (at least from my perspective). However, your patch still
dispatch the messages over the BTLs in a round robin fashion, which
doesn't look to me as the best approach. How about merging your patch
and mine ? We will get a better data distribution and a better
scheduling (on-demand based on the network load).
Btw, did you compare my patch with yours on your multi-NIC system ?
With my patch on our system with 3 networks (2*1Gbs and one 100 Mbs)
I'm close to 99% of the total bandwidth. I'll try to see what I get
Now that we're looking at improving the performances of the multi-BTL
stuff I think I have another idea. How about merging the ack with the
next pipeline fragment for RDMA (except for the last fragment) ?
On Jun 25, 2007, at 8:28 AM, Gleb Natapov wrote:
> Attached patch improves OB1 scheduling algorithm between multiple
> links. Current algorithm perform very poorly if interconnects with
> different bandwidth values are used. For big message sizes it always
> divide traffic equally between all available interconnects. Attached
> patch change this. It calculates for each message how much data
> should be
> send via each link according to relative weight of the link. This is
> done for RDMAed part of the message as well as for the part that is
> by send/recv in the case of pipeline protocol. As a side effect
> send_schedule/recv_schedule functions are greatly simplified.
> Surprisingly (at least for me) this patch is also greatly improves
> benchmarks results when multiple links with the same bandwidth are
> in use.
> Attached postscript shows some benchmark results with and without the
> patch. I used two computers connected with 4 DDR HCAs for this
> Each HCA is capable of ~1600MB on its own.
> devel mailing list
- application/pkcs7-signature attachment: smime.p7s