Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] spin-wait backoff
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-09-03 00:16:25


In the upcoming 1.5 series, we will introduce a new "sensor" framework to help resolve such issues. Among other things, it will automatically track (if requested) the size of a sentinel file, cpu usage, and memory footprint and will terminate the job if any exceed user-specified limits (e.g., file doesn't grow fast enough, memory grows too large).

Backing off the polling rate requires more application-specific logic like that offered below, so it is a little difficult for us to implement at the MPI library level. Not saying we eventually won't - just not sure anyone quite knows how to do so in a generalized form.

On Sep 2, 2010, at 7:46 PM, Douglas Guptill wrote:

> Hi David:
>
> On Fri, Sep 03, 2010 at 10:50:02AM +1000, David Singleton wrote:
>>
>> I'm sure this has been discussed before but having watched hundreds of
>> thousands of cpuhrs being wasted by difficult-to-detect hung jobs, I'd
>> be keen to know why there isn't some sort of "spin-wait backoff" option.
>> For example, a way to specify spin-wait for x seconds/cycles/iterations
>> then backoff to lighter and lighter cpu usage. At least that way, hung
>> jobs would become self-evident.
>>
>> Maybe there is already some way of doing this?
>
> For my solution to this, see
>
> http://www.open-mpi.org/community/lists/users/2010/07/13731.php
>
> HTH,
> Douglas.
> --
> Douglas Guptill voice: 902-461-9749
> Research Assistant, LSC 4640 email: douglas.guptill_at_[hidden]
> Oceanography Department fax: 902-494-3877
> Dalhousie University
> Halifax, NS, B3H 4J1, Canada
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users