You are right, Ralph. There is no surprise behavior. I had forgotten that I had been testing --mca orte_tmpdir_base /dev/shm to see if it worked (and obviously it doesn't). Before that, without any MCA options, OpenMPI had tried /tmp, and gave me the warning about /tmp being NFS mounted, and so I had been exploring options.
I accept your point - I need "a good local directory - anything you have permission to write in will work fine". How would one do this on a stateless node? And can I beat the vendor over the head for not knowing how to set up the node image so that OpenMPI could function properly?
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Ralph Castain
Sent: Thursday, November 03, 2011 11:33 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] Shared-memory problems
I'm afraid this isn't correct. You definitely don't want the session directory in /dev/shm as this will almost always cause problems.
We look thru a progression of envars to find where to put the session directory:
1. the MCA param orte_tmpdir_base
2. the envar OMPI_PREFIX_ENV
3. the envar TMPDIR
4. the envar TEMP
5. the envar TMP
Check all those to see if one is set to /dev/shm. If so, you have a problem to resolve. For performance reasons, you probably don't want the session directory sitting on a network mounted location. What you need is a good local directory - anything you have permission to write in will work fine. Just set one of the above to point to it.
On Nov 3, 2011, at 10:04 AM, Durga Choudhury wrote:
> Since /tmp is mounted across a network and /dev/shm is (always) local,
> /dev/shm seems to be the right place for shared memory transactions.
> If you create temporary files using mktemp is it being created in
> /dev/shm or /tmp?
> On Thu, Nov 3, 2011 at 11:50 AM, Bogdan Costescu <bcostescu_at_[hidden]> wrote:
>> On Thu, Nov 3, 2011 at 15:54, Blosch, Edwin L <edwin.l.blosch_at_[hidden]> wrote:
>>> - /dev/shm is 12 GB and has 755 permissions
>>> % ls -l output:
>>> drwxr-xr-x 2 root root 40 Oct 28 09:14 shm
>> This is your problem: it should be something like drwxrwxrwt. It might
>> depend on the distribution, f.e. the following show this to be a bug:
>> and surely you can find some more on the subject with your favorite
>> search engine. Another source could be a paranoid sysadmin who has
>> changed the default (most likely correct) setting the distribution
>> came with - not only OpenMPI but any application using shmem would be
>> users mailing list
> users mailing list
users mailing list