Thanks for your response, Reuti. Actually I had seen you mention the SGE mailing list in response to a similar question but I can't for the life of me find that list :(
As for using the background queue, just to clarify - is the idea to submit my parallel job on a regular queue with 100 processors at nice 0, but allow other 'background queue' jobs on the same processors at nice 19? Presumably, I'd still need mpi-2's dynamic process management to free up processors when they are not needed (at the moment, they use 100% cpu idling in MPI_Recv for example). Did I understand you correctly?
--- On Tue, 1/25/11, Reuti <reuti_at_[hidden]> wrote:
> From: Reuti <reuti_at_[hidden]>
> Subject: Re: [OMPI users] openmpi's mpi_comm_spawn integrated with sge?
> To: "Open MPI Users" <users_at_[hidden]>
> Date: Tuesday, January 25, 2011, 9:27 AM
> Am 25.01.2011 um 12:32 schrieb Terry
> > On 01/25/2011 02:17 AM, Will Glover wrote:
> >> Hi all,
> >> I tried a google/mailing list search for this but
> came up with nothing, so here goes:
> >> Is there any level of automation between open
> mpi's dynamic process management and the SGE queue
> >> In particular, can I make a call to mpi_comm_spawn
> and have SGE dynamically increase the number of slots?
> >> This seems a little far fetched, but it would be
> really useful if this is possible. My application is
> 'restricted' to coarse-grain task parallelism and involves a
> work load that varies significantly during runtime (between
> 1 and ~100 parallel tasks). Dynamic process management
> would maintain an optimal number of processors and reduce
> >> Many thanks,
> > This is an interesting idea but no integration has
> been done that would allow an MPI job to request more slots.
> Similar ideas were on the former SGE mailing list a couple
> of times - having varying resource requests over the
> lifetime of a job (cores, memory, licenses, ...). This would
> mean in the end to have some kind of real-time-queuing
> system, as you have to have the necessary resources to be
> free in time for sure.
> Besides this also some syntax for either requesting a
> "resource profile over time" when such a job is submitted
> would be necessary, or to allow a job while it's running
> issuing some kinds of commands to request/release resources
> on demand.
> If you have such a "resource profile over time" for a bunch
> of jobs, it could then be extended to solve a cutting-stock
> problem where the unit to be cut would be time, e.g. arrange
> these 10 jobs that they finish in the least amount of time
> all together - and you could predict exactly when each job
> will end. This is getting really complex.
> What can be done in your situation: have some kind of
> "background queue" with a nice value of 19, but the parallel
> job you submit to a queue with the default nice value 0.
> Although you request 100 cores and reserve them (i.e. the
> background queue shouldn't be suspended in such a case of
> course), the background queue will still run at full speed
> when nothing else is running on the nodes. When some of the
> parallel tasks are started on the nodes, they will get most
> of the computing time (this means: oversubscription by
> intention). The background queue can be used for less
> important jobs. Such a setup is usefull when your parallel
> application isn't running in parallel all the time like in
> your case.
> -- Reuti
> > --
> > <Mail-Anhang.gif>
> > Terry D. Dontje | Principal Software Engineer
> > Developer Tools Engineering | +1.781.442.2631
> > Oracle - Performance Technologies
> > 95 Network Drive, Burlington, MA 01803
> > Email terry.dontje_at_[hidden]
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> users mailing list