Brock Palen wrote:
> On Nov 27, 2007, at 10:49 AM, Andrew Friedley wrote:
>> Brock Palen wrote:
>>> On Nov 21, 2007, at 3:39 PM, Andrew Friedley wrote:
>>>> If this is what I think it is, try using this MCA parameter:
>>>> -mca btl_openib_ib_timeout 20
>>> The user used this option and it allowed the run to complete.
>>> You say its a issue with the fabric ibshowerrors does not show any
>>> Its topspin (cisco) gear, nic's, switch,cables.
>>> Should I follow up with cisco more?
>> Sure why not, if you think it'd be useful. FWIW, I see this on
>> Voltaire/Mellanox hardware with Open MPI; others here at LLNL tell me
>> they've seen it with MVAPICH as well.
> What would be a place to look? Should this just be default then for
> OMPI? ompi_info shows the default as 10 seconds? Is that right
> 'seconds' ?
The other IB guys can probably answer better than I can -- I'm not an
expert in this part of IB (or really any part I guess :). Not sure why
a larger value isn't the default. No, its not seconds -- check the
description of the MCA parameter:
4.096 microseconds * (2^btl_openib_ib_timeout)