Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] BTL preferred_protocol , large message
From: George Bosilca (bosilca_at_[hidden])
Date: 2011-03-09 11:50:28


On Mar 9, 2011, at 03:00 , Sylvain Jeaugey wrote:

> Hi George,
>
> This certainly looks like our motivations are close. However, I don't see in the presentation how you implement it (maybe I misread it), especially how you manage to not modify the BTL interface.
>
> Do you have any code / SVN commit references for us to better understand what it's about ?

One gets multiple non-overlapping BTL (in terms of peers), each with its own set of parameters and eventually accepted protocols. Mainly there will be one BTL per memory hierarchy.

I'll cleanup the code and send you a patch.

  george.

>
> Thanks,
> Sylvain
>
> On Tue, 8 Mar 2011, George Bosilca wrote:
>
>>
>> On Mar 8, 2011, at 12:12 , Damien Guinier wrote:
>>
>>> Hi Jeff
>>
>> Sorry, your email went on the devel mailing list of Open MPI.
>>
>>> I'm working on large message exchange optimization. My optimization consists in "choosing
>>> the best protocol for each large message".
>>> In fact,
>>> - for each device, the way to chose the best protocol is different.
>>> - the faster protocol for a given device depends on that device hardware and on the message
>>> specifications.
>>>
>>> So the device/BTL itself is the best place to dynamically select the fastest protocol.
>>>
>>> Presently, for large messages, the protocol selection is only based on device capabilities.
>>> My optimization consists in asking the device/BTL for a "preferred protocol" and
>>> then make a choice based on :
>>> - the device capabilities and the BTL's recommendation.
>>
>> As a BTL will not randomly change its preferred protocol, one can assume it will depend on the peer. Here is a similar approach to one you describe in your email, but without modification of the BTL interface.
>>
>> https://fs.hlrs.de/projects/eurompi2010/TALKS/WEDNESDAY_AFTERNOON/george_bosilca_locality_and_topology_aware.pdf
>>
>> george.
>>
>>
>>
>>>
>>> Technical view:
>>> The optimization is located in mca_pml_ob1_send_request_start_btl(), after the device/btl selection.
>>> In the large message section, I call a new function :
>>> mca_pml_ob1_preferred_protocol() => mca_bml_base_preferred_protocol()
>>> This one will try to launch
>>> btl->btl_preferred_protocol()
>>> So, selecting a protocol before a large message in not in the critical path.
>>> It is the BTL's responsibility to define this function to select a preferred protocol.
>>>
>>> If this function is not defined, nothing changes in the code path
>>> To do this optimization , I had to add an interface to the btl module structure in "btl.h", this is the drawback.
>>>
>>> ----
>>>
>>> I have already used this feature to optimize the "shared memory" device/BTL. I use the "preferred_protocol" feature to enable/disable
>>> KNEM according to intra/inter socket communication. This optimization increases a "IMB pingping benchmark" bandwidth by ~36%.
>>>
>>> ----
>>>
>>> The next step is now to use the "preferred protocol" feature with openib ( with many IB cards)
>>>
>>>
>>>
>>> Attached 2 patches:
>>> 1) BTL_preferred.patch:
>>> introduces the new preferred protocol interface
>>> 2) SM_KNEM_intra_socket.patch:
>>> defines the preferred protocol for the sm btl
>>> Note: Since the "ess" framework can't give us the "socket locality
>>> information", I used hitopo that has been proposed in an RFC
>>> some times ago:
>>> http://www.open-mpi.org/community/lists/devel/2010/11/8677.php
>>>
>>>
>>>
>>> <BTL_preferred.path><SM_KNEM_intra_socket.patch>_______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> "I disapprove of what you say, but I will defend to the death your right to say it"
>> -- Evelyn Beatrice Hall
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

"To preserve the freedom of the human mind then and freedom of the press, every spirit should be ready to devote itself to martyrdom; for as long as we may think as we will, and speak as we think, the condition of man will proceed in improvement."
  -- Thomas Jefferson, 1799