Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi's mpi_comm_spawn integrated with sge?
From: Reuti (reuti_at_[hidden])
Date: 2011-01-25 15:16:55


Am 25.01.2011 um 20:10 schrieb Will Glover:

> Thanks for your response, Reuti. Actually I had seen you mention the SGE mailing list in response to a similar question but I can't for the life of me find that list :(

The list was removed with the shutdown of the open source site by Oracle, moving GridEngine to pure commercial product. But as you might know, Univa stepped in and we will see some findings shortly...

For now you can check Markmail: http://gridengine.markmail.org/ or an unindexed archive at http://arc.liv.ac.uk/pipermail/gridengine-users/ It's a bit hidden in http://arc.liv.ac.uk/pipermail/gridengine-users/2009-December.txt (search for cutting) or http://arc.liv.ac.uk/pipermail/gridengine-users/2010-July.txt (search for varying) There is also another solution explained using a load threshold.

(dynamic MPI-2 tasks shorter than background jobs: use the solution with different nice values,
background jobs much shorter than the MPI-2 tasks: get rid of these background jobs by a load threshold and drain them)

> As for using the background queue, just to clarify - is the idea to submit my parallel job on a regular queue with 100 processors at nice 0

yep

> , but allow other 'background queue' jobs on the same processors at nice 19?

yep

> Presumably, I'd still need mpi-2's dynamic process management to free up processors when they are not needed (at the moment, they use 100% cpu idling in MPI_Recv for example).

When they are really idling at 100%, then you are correct, and you have to release them by an MPI-2 call.

> Did I understand you correctly?

yep.

This way you will minimize the otherwise wasted computing time and avoid idling cores.

-- Reuti
 

> --
> Will
>
> --- On Tue, 1/25/11, Reuti <reuti_at_[hidden]> wrote:
>
>> From: Reuti <reuti_at_[hidden]>
>> Subject: Re: [OMPI users] openmpi's mpi_comm_spawn integrated with sge?
>> To: "Open MPI Users" <users_at_[hidden]>
>> Date: Tuesday, January 25, 2011, 9:27 AM
>> Am 25.01.2011 um 12:32 schrieb Terry
>> Dontje:
>>
>>> On 01/25/2011 02:17 AM, Will Glover wrote:
>>>> Hi all,
>>>> I tried a google/mailing list search for this but
>> came up with nothing, so here goes:
>>>>
>>>> Is there any level of automation between open
>> mpi's dynamic process management and the SGE queue
>> manager?
>>>> In particular, can I make a call to mpi_comm_spawn
>> and have SGE dynamically increase the number of slots?
>>
>>>> This seems a little far fetched, but it would be
>> really useful if this is possible. My application is
>> 'restricted' to coarse-grain task parallelism and involves a
>> work load that varies significantly during runtime (between
>> 1 and ~100 parallel tasks). Dynamic process management
>> would maintain an optimal number of processors and reduce
>> idling.
>>>>
>>>> Many thanks,
>>>>
>>> This is an interesting idea but no integration has
>> been done that would allow an MPI job to request more slots.
>>
>>
>> Similar ideas were on the former SGE mailing list a couple
>> of times - having varying resource requests over the
>> lifetime of a job (cores, memory, licenses, ...). This would
>> mean in the end to have some kind of real-time-queuing
>> system, as you have to have the necessary resources to be
>> free in time for sure.
>>
>> Besides this also some syntax for either requesting a
>> "resource profile over time" when such a job is submitted
>> would be necessary, or to allow a job while it's running
>> issuing some kinds of commands to request/release resources
>> on demand.
>>
>> If you have such a "resource profile over time" for a bunch
>> of jobs, it could then be extended to solve a cutting-stock
>> problem where the unit to be cut would be time, e.g. arrange
>> these 10 jobs that they finish in the least amount of time
>> all together - and you could predict exactly when each job
>> will end. This is getting really complex.
>>
>> ==
>>
>> What can be done in your situation: have some kind of
>> "background queue" with a nice value of 19, but the parallel
>> job you submit to a queue with the default nice value 0.
>> Although you request 100 cores and reserve them (i.e. the
>> background queue shouldn't be suspended in such a case of
>> course), the background queue will still run at full speed
>> when nothing else is running on the nodes. When some of the
>> parallel tasks are started on the nodes, they will get most
>> of the computing time (this means: oversubscription by
>> intention). The background queue can be used for less
>> important jobs. Such a setup is usefull when your parallel
>> application isn't running in parallel all the time like in
>> your case.
>>
>> -- Reuti
>>
>>
>>> --
>>> <Mail-Anhang.gif>
>>> Terry D. Dontje | Principal Software Engineer
>>> Developer Tools Engineering | +1.781.442.2631
>>> Oracle - Performance Technologies
>>> 95 Network Drive, Burlington, MA 01803
>>> Email terry.dontje_at_[hidden]
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>