Are you getting those messages from ompi_info? Or from an MPI app (and if so, what are you doing to get them)?
On Sep 11, 2011, at 5:25 PM, Kevin.Buckley_at_[hidden] wrote:
> I have recently seen some OpenIB time out errors and see the
> following reported:
> * btl_openib_ib_retry_count - The number of times the sender will
> attempt to retry (defaulted to 7, the maximum value).
> * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
> to 10). The actual timeout value used is calculated as:
> I'd like to confirm that, when those messages say "defaulted to",
> they are telling me what's happening on the node in question and
> not just what the default is.
> Reason for asking is that I believe that I am setting the values of
> btl_openib_ib_timeout to 20, globally, as suggested in areas of the
> OpenMPI docs but those messages, if they do report what's happening,
> might be telling me otherwise.
> In case it is relevant, the OpenMPI in question is the bog standard
> RHEL5 1.4.4.
> Kevin M. Buckley Room: CO327
> School of Engineering and Phone: +64 4 463 5971
> Computer Science
> Victoria University of Wellington
> New Zealand
> users mailing list