Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-09-11 21:52:49


Hi Kevin

Are you getting those messages from ompi_info? Or from an MPI app (and if so, what are you doing to get them)?

On Sep 11, 2011, at 5:25 PM, Kevin.Buckley_at_[hidden] wrote:

> I have recently seen some OpenIB time out errors and see the
> following reported:
>
> * btl_openib_ib_retry_count - The number of times the sender will
> attempt to retry (defaulted to 7, the maximum value).
> * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
> to 10). The actual timeout value used is calculated as:
>
> I'd like to confirm that, when those messages say "defaulted to",
> they are telling me what's happening on the node in question and
> not just what the default is.
>
> Reason for asking is that I believe that I am setting the values of
> btl_openib_ib_timeout to 20, globally, as suggested in areas of the
> OpenMPI docs but those messages, if they do report what's happening,
> might be telling me otherwise.
>
> In case it is relevant, the OpenMPI in question is the bog standard
> RHEL5 1.4.4.
>
> --
> Kevin M. Buckley Room: CO327
> School of Engineering and Phone: +64 4 463 5971
> Computer Science
> Victoria University of Wellington
> New Zealand
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users