Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenIB problems
From: Andrew Friedley (afriedle_at_[hidden])
Date: 2007-11-27 15:57:39


Brock Palen wrote:
> On Nov 27, 2007, at 10:49 AM, Andrew Friedley wrote:
>> Brock Palen wrote:
>>> On Nov 21, 2007, at 3:39 PM, Andrew Friedley wrote:
>>>
>>>> If this is what I think it is, try using this MCA parameter:
>>>>
>>>> -mca btl_openib_ib_timeout 20
>>> The user used this option and it allowed the run to complete.
>>> You say its a issue with the fabric ibshowerrors does not show any
>>> problems.
>>>
>>> Its topspin (cisco) gear, nic's, switch,cables.
>>> Should I follow up with cisco more?
>> Sure why not, if you think it'd be useful. FWIW, I see this on
>> Voltaire/Mellanox hardware with Open MPI; others here at LLNL tell me
>> they've seen it with MVAPICH as well.
>
> What would be a place to look? Should this just be default then for
> OMPI? ompi_info shows the default as 10 seconds? Is that right
> 'seconds' ?

The other IB guys can probably answer better than I can -- I'm not an
expert in this part of IB (or really any part I guess :). Not sure why
a larger value isn't the default. No, its not seconds -- check the
description of the MCA parameter:

4.096 microseconds * (2^btl_openib_ib_timeout)

Andrew