Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] shm unlinking
From: Rushton Martin (JMRUSHTON_at_[hidden])
Date: 2011-04-14 09:46:43

A typical file is called

I had assumed that these were from OMPI, but clearly I could be wrong.
They vary in size, but are typically 42MiB, only 0.2% of our small
diskless nodes' memory, but put a dozen in there and they start to be
noticed. lsof shows all the processes in a particular job have the same
one open, the other files are associated chronologically with failed


Martin Rushton
HPC System Manager, Weapons Technologies
Tel: 01959 514777, Mobile: 07939 219057
email: jmrushton_at_[hidden]
QinetiQ - Delivering customer-focused solutions

Please consider the environment before printing this email.
-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
Behalf Of Jeff Squyres
Sent: 14 April 2011 14:33
To: Open MPI Users
Subject: Re: [OMPI users] shm unlinking

On Apr 14, 2011, at 9:22 AM, Rushton Martin wrote:

> For your information: we were supplied with a script when we bought
> the cluster, but the original script made the assumption that all
> processes and shm files belonging to a specific user ought to be
> deleted. This is a problem if users submit jobs which only half fill
> a node and the second job starts on the same node as the first one.
> The first job to finish causes the continuing job to stop dead. We
> therefore had to disable any cleanup to allow jobs to run. Now we are

> finding a slow fill up with the shm files and I need to do something;
> at least now I have a way forward.

Note that Open MPI v1.4.x is likely using mmap files by default -- these
should be under /tmp/ somewhere. If they get left around, they can
cause shared memory to be filled up, but they should also be unrelated
in /dev/shm kinds of things. If you're seeing /dev/shm fill up, that
might be due to something else.

Also, I'm a little confused by your reference to psm_shm... are you
talking about the QLogic PSM device? If that does some tomfoolery with
/dev/shm somewhere, I'm unaware of it (i.e., I don't know much/anything
about what that device does internally).

Jeff Squyres
For corporate legal information go to:
users mailing list
This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is 
addressed. If you are not the intended recipient of this email,
you must neither take any action based upon its contents, nor 
copy or show it to anyone. Please contact the sender if you 
believe you have received this email in error. QinetiQ may 
monitor email traffic data and also the content of email for 
the purposes of security. QinetiQ Limited (Registered in England
& Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX