Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2007-06-26 17:42:05


Gleb,

Simplifying the code and getting better performance is always a good
approach (at least from my perspective). However, your patch still
dispatch the messages over the BTLs in a round robin fashion, which
doesn't look to me as the best approach. How about merging your patch
and mine ? We will get a better data distribution and a better
scheduling (on-demand based on the network load).

Btw, did you compare my patch with yours on your multi-NIC system ?
With my patch on our system with 3 networks (2*1Gbs and one 100 Mbs)
I'm close to 99% of the total bandwidth. I'll try to see what I get
with yours.

Now that we're looking at improving the performances of the multi-BTL
stuff I think I have another idea. How about merging the ack with the
next pipeline fragment for RDMA (except for the last fragment) ?

   Thanks,
     george.

On Jun 25, 2007, at 8:28 AM, Gleb Natapov wrote:

> Hello,
>
> Attached patch improves OB1 scheduling algorithm between multiple
> links. Current algorithm perform very poorly if interconnects with
> very
> different bandwidth values are used. For big message sizes it always
> divide traffic equally between all available interconnects. Attached
> patch change this. It calculates for each message how much data
> should be
> send via each link according to relative weight of the link. This is
> done for RDMAed part of the message as well as for the part that is
> send
> by send/recv in the case of pipeline protocol. As a side effect
> send_schedule/recv_schedule functions are greatly simplified.
>
> Surprisingly (at least for me) this patch is also greatly improves
> some
> benchmarks results when multiple links with the same bandwidth are
> in use.
> Attached postscript shows some benchmark results with and without the
> patch. I used two computers connected with 4 DDR HCAs for this
> benchmark.
> Each HCA is capable of ~1600MB on its own.
>
> --
>
> Gleb.<ob1_multi_nic_scheduling.diff><openib_mulihca_bw.ps>____________
> ___________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



  • application/pkcs7-signature attachment: smime.p7s