Hi Jeff
I'm working on large message exchange optimization. My optimization
consists in "choosing
the best protocol for each large message".
In fact,
- for each device, the way to chose the best protocol is different.
- the faster protocol for a given device depends on that device hardware
and on the message
specifications.
So the device/BTL itself is the best place to dynamically select the
fastest protocol.
Presently, for large messages, the protocol selection is only based on
device capabilities.
My optimization consists in asking the device/BTL for a "preferred
protocol" and
then make a choice based on :
- the device capabilities and the BTL's recommendation.
Technical view:
The optimization is located in mca_pml_ob1_send_request_start_btl(),
after the device/btl selection.
In the large message section, I call a new function :
mca_pml_ob1_preferred_protocol() => mca_bml_base_preferred_protocol()
This one will try to launch
btl->btl_preferred_protocol()
So, selecting a protocol before a large message in not in the critical
path.
It is the BTL's responsibility to define this function to select a
preferred protocol.
If this function is not defined, nothing changes in the code path
To do this optimization , I had to add an interface to the btl module
structure in "btl.h", this is the drawback.
----
I have already used this feature to optimize the "shared memory"
device/BTL. I use the "preferred_protocol" feature to enable/disable
KNEM according to intra/inter socket communication. This optimization
increases a "IMB pingping benchmark" bandwidth by ~36%.
----
The next step is now to use the "preferred protocol" feature with openib
( with many IB cards)
Attached 2 patches:
1) BTL_preferred.patch:
introduces the new preferred protocol interface
2) SM_KNEM_intra_socket.patch:
defines the preferred protocol for the sm btl
Note: Since the "ess" framework can't give us the "socket locality
information", I used hitopo that has been proposed in an RFC
some times ago:
http://www.open-mpi.org/community/lists/devel/2010/11/8677.php
|