Thanks for the response. In the last 5 minutes I have managed to get
some output from the prologue.parallel scripts, it turns out that the
Torque Administrator's Manual was wrong, and I was fool enough to
believe it! Now that I've got a working model I can start to sort out
the mess with psm_shm files.
For your information: we were supplied with a script when we bought the
cluster, but the original script made the assumption that all processes
and shm files belonging to a specific user ought to be deleted. This is
a problem if users submit jobs which only half fill a node and the
second job starts on the same node as the first one. The first job to
finish causes the continuing job to stop dead. We therefore had to
disable any cleanup to allow jobs to run. Now we are finding a slow
fill up with the shm files and I need to do something; at least now I
have a way forward.
HPC System Manager, Weapons Technologies
Tel: 01959 514777, Mobile: 07939 219057
QinetiQ - Delivering customer-focused solutions
Please consider the environment before printing this email.
This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is
addressed. If you are not the intended recipient of this email,
you must neither take any action based upon its contents, nor
copy or show it to anyone. Please contact the sender if you
believe you have received this email in error. QinetiQ may
monitor email traffic data and also the content of email for
the purposes of security. QinetiQ Limited (Registered in England
& Wales: Company Number: 3796233) Registered office: Cody Technology
Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.