Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [PATCH] fix mx btl_bandwidth
From: Rolf vandeVaart (rolf.vandevaart_at_[hidden])
Date: 2010-09-08 09:43:40


  On 9/8/2010 8:09 AM, Brice Goglin wrote:
> Le 08/09/2010 14:02, Jeff Squyres a écrit :
>> On Sep 3, 2010, at 3:38 PM, George Bosilca wrote:
>>
>>
>>> However, going over the existing BTLs I can see that some BTLs do not correctly set this value:
>>>
>>> BTL Bandwidth Auto-detect Status
>>> Elan 2000 NO Correct
>>> GM 250 NO Doubtful
>>> MX 2000/10000 YES (Mbs) Correct (before the patch)
>>> OFUD 800 NO Doubtful
>>> OpenIB 2000/4000/8000 YES (Mbs) Correct (multiplied by the active_width)
>>> Portals 1000 NO Doubtful
>>> SCTP 100 NO Conservative value (correct)
>>> Self 100 XXX Correct (doesn't matter anyway)
>>> SM 9000 NO Correct
>>> TCP 100 NO Conservative value (correct)
>>> UDAPL 225 NO Incorrect
>>>
>> Now that that patch has been rolled back out, did we come to conclusion here?
>>
>> - OFUD: why do we still even have this?
>> - Portals: does it matter if it gets it wrong? No one will ever multi-rail with it.
>> - TCP: we can add auto-detect code for this (But doesn't have to be right away -- i.e., don't make 1.5.0 wait for it).
>> - UDAPL: I don't think anyone will multi-rail udapl with anything.
>>
>> Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected incorrectly somehow?
>>
> The first problem came from IB not autodetecting at all by default and
> using 800Mbit/s instead. When forcing autodetect with mca parameters,
> the bandwidth are not perfect but not too bad. When forcing IB manually
> to the right bandwidth value, I can tweak things as needed.
>
> Brice
Just to provide some closure on the uDAPL side, we agree with Jeff's
comment that we do not see any demand for multi-rail uDAPL with anything.
But, we will change the uDPAL number to something more reasonable.
Still trying to select an appropriate value.
Rolf