On Sep 3, 2010, at 12:16 AM, Ralph Castain wrote:
> Backing off the polling rate requires more application-specific logic like that offered below, so it is a little difficult for us to implement at the MPI library level. Not saying we eventually won't - just not sure anyone quite knows how to do so in a generalized form.
FWIW, we've *talked* about this kind of stuff among the developers -- it's at least somewhat similar to the "backoff to blocking communications instead of polling communications" issues. That work in particular has been discussed for a long time but never implemented.
Are your jobs hanging because of deadlock (i.e., application error), or infrastructure error? If they're hanging because of deadlock, there are some PMPI-based tools that might be able to help.
--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
|