Am 14.03.2012 um 18:30 schrieb Joshua Baker-LePain:
> On Wed, 14 Mar 2012 at 9:33am, Reuti wrote
>>> I can run as many threads as I like on a single system with no problems, even if those threads are running at different nice levels.
>> How do they get different nice levels - you renice them? I would assume that all start at the same of the parent. In your test program you posted there are no threads.
> Ah, thanks for pointing this out. Yes, when a job runs on a single host (even if SGE has assigned it to multiple queues), there's no qrsh involved. There's just a simple mpirun and all the threads run at the same priority. I did try renicing half the threads, and the job didn't fail.
>>> The problem seems to arise when I'm both a) running across multiple machines and b) running threads at differing nice levels (which often happens as a result of our queueing setup).
>> This sounds like you are getting slots from different queues assigned to one and the same job. My experience: don't do it, unless you neeed it.
> You are correct -- the problem is specific to a parallel job getting slots from different queues.
I'm not sure, whether it's good regarding the runtime of the jobs. The un-niced processes might have to wait sometimes for the slower processes to exchange their results.
But I think it's not a matter of the nice value, as the timing behavior may be similar to have faster and slower machines.
> Our cluster is used by a combination of folks who've financially supported it, and those that haven't. Our high priority queue, lab.q, runs un-niced and is available only to those who have donated money and/or machines to us. Our low priority queue, long.q, runs nice 19 and is available to all. The goal is to ensure instant access by a lab to its "share" of the cluster while letting both those users and non-supporting users to use as many cores as they can in long.q. We explicitly allow overloading to further support our goal of keeping the usage both full and fair.
Although it's not an explanation (as I have none for this behavior), you could even increase the slots in the priority queue by the number of slots you allow in the low priroty queue and force parallel jobs to stay inside long.q. The overall amount you can still limit on an exechost level to match the targeted oversubscription.
Does the low priority have the same limit regarding memory ot anything else like the high priority queue?
> The setup is a bit convoluted, but it has kept the users (and, more importantly, the PIs) happy. Until the recent upgrade to CentOS 6 and concomitant switch from MPICH2 to Open MPI, we've had no issues with parallel jobs and this queue setup. And the test jobs I've tried with our old MPICH2 install (and the MPICH tight integration) running under CentOS 6 don't fail either.
>> Do you face the same if you stay in one and the same queue across the machines?
> Jobs don't crash if they either:
> a) all run in the same queue, or
> b) run in multiple queues all on one machine
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> users mailing list