Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] shm unlinking
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-04-14 11:40:55

They could be from OMPI -- are you using QLogic IB NICs? That's the only thing named "PSM" in Open MPI.

On Apr 14, 2011, at 9:46 AM, Rushton Martin wrote:

> A typical file is called
> /dev/shm/psm_shm.41e04667-f3ba-e503-8464-db6c209b3430
> I had assumed that these were from OMPI, but clearly I could be wrong.
> They vary in size, but are typically 42MiB, only 0.2% of our small
> diskless nodes' memory, but put a dozen in there and they start to be
> noticed. lsof shows all the processes in a particular job have the same
> one open, the other files are associated chronologically with failed
> jobs.
> Martin Rushton
> HPC System Manager, Weapons Technologies
> Tel: 01959 514777, Mobile: 07939 219057
> email: jmrushton_at_[hidden]
> QinetiQ - Delivering customer-focused solutions
> Please consider the environment before printing this email.
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
> Behalf Of Jeff Squyres
> Sent: 14 April 2011 14:33
> To: Open MPI Users
> Subject: Re: [OMPI users] shm unlinking
> On Apr 14, 2011, at 9:22 AM, Rushton Martin wrote:
>> For your information: we were supplied with a script when we bought
>> the cluster, but the original script made the assumption that all
>> processes and shm files belonging to a specific user ought to be
>> deleted. This is a problem if users submit jobs which only half fill
>> a node and the second job starts on the same node as the first one.
>> The first job to finish causes the continuing job to stop dead. We
>> therefore had to disable any cleanup to allow jobs to run. Now we are
>> finding a slow fill up with the shm files and I need to do something;
>> at least now I have a way forward.
> Note that Open MPI v1.4.x is likely using mmap files by default -- these
> should be under /tmp/ somewhere. If they get left around, they can
> cause shared memory to be filled up, but they should also be unrelated
> in /dev/shm kinds of things. If you're seeing /dev/shm fill up, that
> might be due to something else.
> Also, I'm a little confused by your reference to psm_shm... are you
> talking about the QLogic PSM device? If that does some tomfoolery with
> /dev/shm somewhere, I'm unaware of it (i.e., I don't know much/anything
> about what that device does internally).
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> _______________________________________________
> users mailing list
> users_at_[hidden]
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is
> addressed. If you are not the intended recipient of this email,
> you must neither take any action based upon its contents, nor
> copy or show it to anyone. Please contact the sender if you
> believe you have received this email in error. QinetiQ may
> monitor email traffic data and also the content of email for
> the purposes of security. QinetiQ Limited (Registered in England
> & Wales: Company Number: 3796233) Registered office: Cody Technology
> Park, Ively Road, Farnborough, Hampshire, GU14 0LX
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
For corporate legal information go to: