Am 12.10.2010 um 15:49 schrieb Dave Love:
> Chris Jewell <chris.jewell_at_[hidden]> writes:
>> I've scrapped this system now in favour of the new SGE core binding feature.
> How does that work, exactly? I thought the OMPI SGE integration didn't
> support core binding, but good if it does.
With the default binding_instance set to "set" (the default) the shepherd should bind the processes to cores already. With other types of binding_instance these selected cores must be forward to the application via an environment variable or in the hostfile.
As this is only a hint to SGE and not a hard request, the user must plan a little bit the allocation beforehand. Especially if you oversubscribe a machine it won't work. When I look at /proc/*/status it's mentioned there as it happened. And it's also noted in "config" file of each job's .../active_jobs/... file. E.g. a top shows:
9926 ms04 39 19 3756 292 228 R 25 0.0 0:19.31 ever
9927 ms04 39 19 3756 292 228 R 25 0.0 0:19.31 ever
9925 ms04 39 19 3756 288 228 R 25 0.0 0:19.30 ever
9928 ms04 39 19 3756 292 228 R 25 0.0 0:19.30 ever
for 4 forks of an endless loop in one and the same jobscript when submitted with `qsub -binding linear:1 demo.sh`. Well, the funny thing is that with this kernel version I still get a load of 4, despite the fact that all 4 forks are bound to one core. Should it really be four?
> users mailing list