Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenIB problems
From: Andrew Friedley (afriedle_at_[hidden])
Date: 2007-11-28 11:35:50


What value do you suggest then? I know I've seen the problem persist at
values of 14 and 16, and would rather be certain that this isn't going
to kill the job that just sat in the queue for a week.

Andrew

Jeff Squyres wrote:
> Roland thought that the default value of 10 might be a bit too low and
> that tuning it to be higher, particularly in apps that pound on a
> single port, would probably be acceptable.
>
> Tuning up to 20 is probably a bit overkill.
>
>
>
> On Nov 27, 2007, at 3:54 PM, Jeff Squyres wrote:
>
>> BTW, Andrew is correct about the unit for btl_openib_ib_timeout and
>> that the value is simply passed down to the verbs library when
>> making an IB connection. Open MPI does nothing else with that
>> value; it's an IBTA-defined value.
>>
>> The help message was wrong on the 1.2 branch for a while; I think
>> it's been corrected in more recent versions of OMPI (i.e., >1.2 -- I
>> don't recall which version specifically).
>>
>>
>> On Nov 27, 2007, at 3:19 PM, Andrew Friedley wrote:
>>
>>>
>>> Brock Palen wrote:
>>>>>> What would be a place to look? Should this just be default then
>>>>>> for
>>>>>> OMPI? ompi_info shows the default as 10 seconds? Is that right
>>>>>> 'seconds' ?
>>>>> The other IB guys can probably answer better than I can -- I'm
>>>>> not an
>>>>> expert in this part of IB (or really any part I guess :). Not sure
>>>>> why
>>>>> a larger value isn't the default. No, its not seconds -- check the
>>>>> description of the MCA parameter:
>>>>>
>>>>> 4.096 microseconds * (2^btl_openib_ib_timeout)
>>>> You sure?
>>>> ompi_info --param btl openib
>>>>
>>>> MCA btl: parameter "btl_openib_ib_timeout" (current value: "10")
>>>> InfiniBand transmit timeout, in seconds
>>>> (must be >= 1)
>>> Yeah:
>>>
>>> MCA btl: parameter "btl_openib_ib_timeout" (current value: "10")
>>> InfiniBand transmit timeout, plugged into formula:
>>> 4.096 microseconds * (2^btl_openib_ib_timeout)(must be
>>>> = 0 and <= 31)
>>> Reading earlier in the thread you said OMPI v1.2.0, I got this from a
>>> trunk checkout thats around 3 weeks old. A quick check shows this
>>> description was changed between 1.2.0 and 1.2.1. However the use of
>>> this parameter hasn't changed -- it's simply passed along to IB verbs
>>> when creating a queue pair (aka a connection).
>>>
>>> Andrew
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>>
>
>