QLogic IBA 7220
Which is interesting in itself, the IB hasn't worked properly since the
cluster was delivered.
HPC System Manager, Weapons Technologies
Tel: 01959 514777, Mobile: 07939 219057
QinetiQ - Delivering customer-focused solutions
Please consider the environment before printing this email.
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
Behalf Of Jeff Squyres
Sent: 14 April 2011 16:41
To: Open MPI Users
Subject: Re: [OMPI users] shm unlinking
They could be from OMPI -- are you using QLogic IB NICs? That's the
only thing named "PSM" in Open MPI.
On Apr 14, 2011, at 9:46 AM, Rushton Martin wrote:
> A typical file is called
> I had assumed that these were from OMPI, but clearly I could be wrong.
> They vary in size, but are typically 42MiB, only 0.2% of our small
> diskless nodes' memory, but put a dozen in there and they start to be
> noticed. lsof shows all the processes in a particular job have the
> same one open, the other files are associated chronologically with
> failed jobs.
> Martin Rushton
> HPC System Manager, Weapons Technologies
> Tel: 01959 514777, Mobile: 07939 219057
> email: jmrushton_at_[hidden]
> QinetiQ - Delivering customer-focused solutions
> Please consider the environment before printing this email.
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]]
> On Behalf Of Jeff Squyres
> Sent: 14 April 2011 14:33
> To: Open MPI Users
> Subject: Re: [OMPI users] shm unlinking
> On Apr 14, 2011, at 9:22 AM, Rushton Martin wrote:
>> For your information: we were supplied with a script when we bought
>> the cluster, but the original script made the assumption that all
>> processes and shm files belonging to a specific user ought to be
>> deleted. This is a problem if users submit jobs which only half fill
>> a node and the second job starts on the same node as the first one.
>> The first job to finish causes the continuing job to stop dead. We
>> therefore had to disable any cleanup to allow jobs to run. Now we
>> finding a slow fill up with the shm files and I need to do something;
>> at least now I have a way forward.
> Note that Open MPI v1.4.x is likely using mmap files by default --
> these should be under /tmp/ somewhere. If they get left around, they
> can cause shared memory to be filled up, but they should also be
> unrelated in /dev/shm kinds of things. If you're seeing /dev/shm fill
> up, that might be due to something else.
> Also, I'm a little confused by your reference to psm_shm... are you
> talking about the QLogic PSM device? If that does some tomfoolery
> with /dev/shm somewhere, I'm unaware of it (i.e., I don't know
> much/anything about what that device does internally).
> Jeff Squyres
> For corporate legal information go to:
> users mailing list
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is addressed.
> If you are not the intended recipient of this email, you must neither
> take any action based upon its contents, nor copy or show it to
> anyone. Please contact the sender if you believe you have received
> this email in error. QinetiQ may monitor email traffic data and also
> the content of email for the purposes of security. QinetiQ Limited
> (Registered in England & Wales: Company Number: 3796233) Registered
> office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14
> 0LX http://www.qinetiq.com.
> users mailing list
For corporate legal information go to:
users mailing list