Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] shm unlinking
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-04-14 09:32:34


On Apr 14, 2011, at 9:22 AM, Rushton Martin wrote:

> For your information: we were supplied with a script when we bought the
> cluster, but the original script made the assumption that all processes
> and shm files belonging to a specific user ought to be deleted. This is
> a problem if users submit jobs which only half fill a node and the
> second job starts on the same node as the first one. The first job to
> finish causes the continuing job to stop dead. We therefore had to
> disable any cleanup to allow jobs to run. Now we are finding a slow
> fill up with the shm files and I need to do something; at least now I
> have a way forward.

Note that Open MPI v1.4.x is likely using mmap files by default -- these should be under /tmp/ somewhere. If they get left around, they can cause shared memory to be filled up, but they should also be unrelated in /dev/shm kinds of things. If you're seeing /dev/shm fill up, that might be due to something else.

Also, I'm a little confused by your reference to psm_shm... are you talking about the QLogic PSM device? If that does some tomfoolery with /dev/shm somewhere, I'm unaware of it (i.e., I don't know much/anything about what that device does internally).

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/