Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?
From: Kevin.Buckley_at_[hidden]
Date: 2011-09-11 19:25:59

I have recently seen some OpenIB time out errors and see the
following reported:

 * btl_openib_ib_retry_count - The number of times the sender will
   attempt to retry (defaulted to 7, the maximum value).
 * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
   to 10). The actual timeout value used is calculated as:

I'd like to confirm that, when those messages say "defaulted to",
they are telling me what's happening on the node in question and
not just what the default is.

Reason for asking is that I believe that I am setting the values of
btl_openib_ib_timeout to 20, globally, as suggested in areas of the
OpenMPI docs but those messages, if they do report what's happening,
might be telling me otherwise.

In case it is relevant, the OpenMPI in question is the bog standard
RHEL5 1.4.4.

Kevin M. Buckley                                  Room:  CO327
School of Engineering and                         Phone: +64 4 463 5971
 Computer Science
Victoria University of Wellington
New Zealand