Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] over-subscription of cores
From: Reuti (reuti_at_[hidden])
Date: 2011-12-26 12:44:18


Hi,

Am 26.12.2011 um 17:55 schrieb Santosh Ansumali:

> Dear Dr. Correa,
> Sorry for my ignorance on cluster maintenance. So far our
> cluster is just set-up by a vendor and we do not know more details.
> So far I am understanding the concept but we are not able to follow
> what precisely we need to try out for allowing oversubscription.
> In this submission file
> my current submission file is written as follows
> #!/bin/bash
> #$ -N first
> #$ -S /bin/bash
> #$ -cwd
> #$ -e $JOB_ID.$JOB_NAME.ERROR
> #$ -o $JOB_ID.$JOB_NAME.OUTPUT
> #$ -P faculty_prj
> #$ -p 0
> #$ -pe orte 8
> /opt/mpi/openmpi/1.3.3/gnu/bin/mpirun -np $NSLOTS ./test_vel.out

if it's a common question regarding the queuing system, you can also turn to the SGE list http://gridengine.org/blog/2011/01/27/gridengine-users-mailing-list/ If you have no contract with the vendor, you will need someone in charge for it at your site and gets familiar with SGE administration.

$ man queue_conf # Have a look what can be defined in a queue.

$ qconf -sql # Shows what queues are defined.

$ qconf -mq all.q # replace all.q with the one you found above and edit the slot count.

$ man sge_pe # Check the options for the PE.

$ qconf -spl # Shows what PEs are defined.

$ qconf -mp orte # Check the allocation rule, what's there?

Then change in the job script the 8 to the number you used above.

-- Reuti

> what change we should do to allow for oversubscription.
> Best,
> Santosh
>
>
>
> On Mon, Dec 26, 2011 at 9:02 PM, Reuti <reuti_at_[hidden]> wrote:
>> Am 23.12.2011 um 21:16 schrieb Gustavo Correa:
>>
>>> I don't know about the grid engine/ SGE.
>>> However, in Torque, the batch/resource manager I use,
>>> to allow oversubscription, you need to modify the batch server nodes file
>>> and pretend the nodes have more cores than the physical ones.
>>> [Something like 'node01 np=8' would change to 'node01 np=16' for instance.]
>>> Maybe there is something similar in SGE.
>>
>> Yep, it's in the queue definition, where you can define the slots per queue instance on each machine.
>>
>> Depending on your setup: if you have more than one queue per machine, the admin might already have set up some RQS (Resource Quota Set) or an absolut limit of slots across all queues residing on a host in teh exechost definition. In this case this needs to be adjusted too.
>>
>> -- Reuti
>>
>>
>>> We had bad results [program hanging or aborting]
>>> when trying to run large programs which include PDE solvers
>>> [climate models] and allowing oversubscription, even when a substantial amount
>>> of RAM was idle.
>>> That was a while ago, and I have not pursued the issue any further.
>>> Maybe context switching among the [surplus of] processes is the problem.
>>> Of course for 'hello, world' type of programs oversubscription works well.
>>> Where is the threshold when oversubscription makes a program break down,
>>> I'd guess only trial and error may tell.
>>>
>>> I hope this helps,
>>> Gus Correa
>>>
>>> On Dec 23, 2011, at 2:42 PM, Santosh Ansumali wrote:
>>>
>>>> Dear All,
>>>> We are running a PDE solver which is memory bound. Due to
>>>> cache related issue, smaller number of grid point per core leads to
>>>> better performance for this code. Thus, though available memory per
>>>> core is more than 2 GB, we are able to good performance by using
>>>> less than 1 GB per core.
>>>>
>>>> I want to know whether oversubscribing the cores can potentially
>>>> improve performance of such a code. My thinking is that if I
>>>> oversubscribe the cores, each thread will be using less than 1 GB so
>>>> cache related problems will be less severe. Is this logic correct or
>>>> due to cache conflict performance will deteriorate further?
>>>> In case, over-subscription can help, how shall I modify
>>>> submission file (using sun grid engine) to enable over-subscription of
>>>> cores?
>>>> my current submission file is written as follows
>>>> #!/bin/bash
>>>> #$ -N first
>>>> #$ -S /bin/bash
>>>> #$ -cwd
>>>> #$ -e $JOB_ID.$JOB_NAME.ERROR
>>>> #$ -o $JOB_ID.$JOB_NAME.OUTPUT
>>>> #$ -P faculty_prj
>>>> #$ -p 0
>>>> #$ -pe orte 8
>>>> /opt/mpi/openmpi/1.3.3/gnu/bin/mpirun -np $NSLOTS ./test_vel.out
>>>>
>>>> Is it possible to allow over-subscription by modifying submission file
>>>> itself? Or do I need to change hostfiles somehow?
>>>> Thanks for your help!
>>>> Best Regards
>>>> Santosh Ansumali,
>>>> Faculty Fellow,
>>>> Engineering Mechanics Unit
>>>> Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR)
>>>> Jakkur, Bangalore-560 064, India
>>>> Tel: + 91 80 22082938
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> Santosh Ansumali,
> Faculty Fellow,
> Engineering Mechanics Unit
> Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR)
> Jakkur, Bangalore-560 064, India
> Tel: + 91 80 22082938
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>