Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenIB problems
From: Brock Palen (brockp_at_[hidden])
Date: 2007-11-28 18:27:39


Jeff thanks for all the reply's,

Hate to admit but at the moment we can't log onto the switch.

But the ibcheckerrors command returns nothing out of bounds, and i
think that command also checks the switch ports.

Thanks, we will do some tests

Brock Palen
Center for Advanced Computing
brockp_at_[hidden]
(734)936-1985

On Nov 27, 2007, at 4:50 PM, Jeff Squyres wrote:

> Sorry for jumping in late; the holiday and other travel prevented me
> from getting to all my mail recently... :-\
>
> Have you checked the counters on the subnet manager to see if any
> other errors are occurring? It might be good to clear all the
> counters, run the job, and see if the counters are increasing faster
> than they should (i.e., any particular counter should advance very
> very slowly -- perhaps 1 per day or so).
>
> I'll ask around the kernel-level guys (i.e., Roland) to see what else
> could cause this kind of error.
>
>
>
> On Nov 27, 2007, at 3:35 PM, Brock Palen wrote:
>
>> Ok i will open a case with cisco,
>>
>>
>> Brock Palen
>> Center for Advanced Computing
>> brockp_at_[hidden]
>> (734)936-1985
>>
>>
>> On Nov 27, 2007, at 4:19 PM, Andrew Friedley wrote:
>>
>>>
>>>
>>> Brock Palen wrote:
>>>>>> What would be a place to look? Should this just be default then
>>>>>> for
>>>>>> OMPI? ompi_info shows the default as 10 seconds? Is that right
>>>>>> 'seconds' ?
>>>>> The other IB guys can probably answer better than I can -- I'm
>>>>> not an
>>>>> expert in this part of IB (or really any part I guess :). Not
>>>>> sure
>>>>> why
>>>>> a larger value isn't the default. No, its not seconds -- check
>>>>> the
>>>>> description of the MCA parameter:
>>>>>
>>>>> 4.096 microseconds * (2^btl_openib_ib_timeout)
>>>>
>>>> You sure?
>>>> ompi_info --param btl openib
>>>>
>>>> MCA btl: parameter "btl_openib_ib_timeout" (current value: "10")
>>>> InfiniBand transmit timeout, in seconds
>>>> (must be >= 1)
>>>
>>> Yeah:
>>>
>>> MCA btl: parameter "btl_openib_ib_timeout" (current value: "10")
>>> InfiniBand transmit timeout, plugged into formula:
>>> 4.096 microseconds * (2^btl_openib_ib_timeout)(must be
>>>> = 0 and <= 31)
>>>
>>> Reading earlier in the thread you said OMPI v1.2.0, I got this
>>> from a
>>> trunk checkout thats around 3 weeks old. A quick check shows this
>>> description was changed between 1.2.0 and 1.2.1. However the use of
>>> this parameter hasn't changed -- it's simply passed along to IB
>>> verbs
>>> when creating a queue pair (aka a connection).
>>>
>>> Andrew
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>