Attached patch improves OB1 scheduling algorithm between multiple
links. Current algorithm perform very poorly if interconnects with very
different bandwidth values are used. For big message sizes it always
divide traffic equally between all available interconnects. Attached
patch change this. It calculates for each message how much data should be
send via each link according to relative weight of the link. This is
done for RDMAed part of the message as well as for the part that is send
by send/recv in the case of pipeline protocol. As a side effect
send_schedule/recv_schedule functions are greatly simplified.
Surprisingly (at least for me) this patch is also greatly improves some
benchmarks results when multiple links with the same bandwidth are in use.
Attached postscript shows some benchmark results with and without the
patch. I used two computers connected with 4 DDR HCAs for this benchmark.
Each HCA is capable of ~1600MB on its own.