This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
I have recently seen some OpenIB time out errors and see the
* btl_openib_ib_retry_count - The number of times the sender will
attempt to retry (defaulted to 7, the maximum value).
* btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
to 10). The actual timeout value used is calculated as:
I'd like to confirm that, when those messages say "defaulted to",
they are telling me what's happening on the node in question and
not just what the default is.
Reason for asking is that I believe that I am setting the values of
btl_openib_ib_timeout to 20, globally, as suggested in areas of the
OpenMPI docs but those messages, if they do report what's happening,
might be telling me otherwise.
In case it is relevant, the OpenMPI in question is the bog standard
Kevin M. Buckley Room: CO327
School of Engineering and Phone: +64 4 463 5971
Victoria University of Wellington