Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] shm unlinking
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-04-14 11:40:55


They could be from OMPI -- are you using QLogic IB NICs? That's the only thing named "PSM" in Open MPI.

On Apr 14, 2011, at 9:46 AM, Rushton Martin wrote:

> A typical file is called
> /dev/shm/psm_shm.41e04667-f3ba-e503-8464-db6c209b3430
>
> I had assumed that these were from OMPI, but clearly I could be wrong.
> They vary in size, but are typically 42MiB, only 0.2% of our small
> diskless nodes' memory, but put a dozen in there and they start to be
> noticed. lsof shows all the processes in a particular job have the same
> one open, the other files are associated chronologically with failed
> jobs.
>
> HTH
>
> Martin Rushton
> HPC System Manager, Weapons Technologies
> Tel: 01959 514777, Mobile: 07939 219057
> email: jmrushton_at_[hidden]
> www.QinetiQ.com
> QinetiQ - Delivering customer-focused solutions
>
> Please consider the environment before printing this email.
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
> Behalf Of Jeff Squyres
> Sent: 14 April 2011 14:33
> To: Open MPI Users
> Subject: Re: [OMPI users] shm unlinking
>
> On Apr 14, 2011, at 9:22 AM, Rushton Martin wrote:
>
>> For your information: we were supplied with a script when we bought
>> the cluster, but the original script made the assumption that all
>> processes and shm files belonging to a specific user ought to be
>> deleted. This is a problem if users submit jobs which only half fill
>> a node and the second job starts on the same node as the first one.
>> The first job to finish causes the continuing job to stop dead. We
>> therefore had to disable any cleanup to allow jobs to run. Now we are
>
>> finding a slow fill up with the shm files and I need to do something;
>> at least now I have a way forward.
>
> Note that Open MPI v1.4.x is likely using mmap files by default -- these
> should be under /tmp/ somewhere. If they get left around, they can
> cause shared memory to be filled up, but they should also be unrelated
> in /dev/shm kinds of things. If you're seeing /dev/shm fill up, that
> might be due to something else.
>
> Also, I'm a little confused by your reference to psm_shm... are you
> talking about the QLogic PSM device? If that does some tomfoolery with
> /dev/shm somewhere, I'm unaware of it (i.e., I don't know much/anything
> about what that device does internally).
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is
> addressed. If you are not the intended recipient of this email,
> you must neither take any action based upon its contents, nor
> copy or show it to anyone. Please contact the sender if you
> believe you have received this email in error. QinetiQ may
> monitor email traffic data and also the content of email for
> the purposes of security. QinetiQ Limited (Registered in England
> & Wales: Company Number: 3796233) Registered office: Cody Technology
> Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/