On 11/16/2010 04:26 AM, Chris Jewell wrote:
I believe the above is correct.
On 11/15/2010 02:11 PM, Reuti wrote:
Just to give my understanding of the problem:
Sorry, I am still trying to grok all your email as what the problem you
are trying to solve. So is the issue is trying to have two jobs having
processes on the same node be able to bind there processes on different
resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?
You can't get 2 slots on a machine, as it's limited by the core count to one here, so such a slot allocation shouldn't occur at all.
So to clarify, the current -binding <binding_strategy>:<binding_amount> allocates binding_amount cores to each sge_shepherd process associated with a job_id. There appears to be only one sge_shepherd process per job_id per execution node, so all child processes run on these allocated cores. This is irrespective of the number of slots allocated to the node.
That might be correct, I've put in a question to someone who should
I agree with Reuti that the binding_amount parameter should be a maximum number of bound cores per node, with the actual number determined by the number of slots allocated per node. FWIW, an alternative approach might be to have another binding_type ('slot', say) that automatically allocated one core per slot.
Yes, that would get ugly.
Of course, a complex situation might arise if a user submits a combined MPI/multithreaded job, but then I guess we're into the realm of setting allocation_rule.
Is the patch you're wanting is for a "slot" binding_type?
Is it going to be worth looking at creating a patch for this? I don't know much of the internals of SGE -- would it be hard work to do? I've not that much time to dedicate towards it, but I could put some effort in if necessary...
Terry D. Dontje | Principal Software Engineer
Engineering | +1.781.442.2631
Oracle - Performance
95 Network Drive,
Burlington, MA 01803