Cross-thread response here, as this is related to the shared-memory thread:
Yes it sucks, so that's what led me to post my original question: If /dev/shm isn't the right place to put the session file, and /tmp is NFS-mounted, then what IS the "right" way to set up a diskless cluster? I don't think the idea of tempfs sounds very appealing, after reading the discussion in FAQ #8 about shared-memory usage. We definitely have a job-queueing system and jobs are very often killed using qdel, and writing a post-script handler is way beyond the level of involvement or expertise we can expect from our sys admins.
Surely there's some reasonable guidance that can be offered to work around an issue that is so disabling.
A related question would be: How is it that HP-MPI works just fine on this cluster as it is configured now? Are they doing something different for shared memory communications?
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Jeff Squyres
Sent: Thursday, November 03, 2011 11:35 AM
To: Open MPI Users
Subject: EXTERNAL: Re: [OMPI users] How to set up state-less node /tmp for OpenMPI usage
On Nov 1, 2011, at 7:31 PM, Blosch, Edwin L wrote:
> I'm getting this message below which is observing correctly that /tmp is NFS-mounted. But there is no other directory which has user or group write permissions. So I think I'm kind of stuck, and it sounds like a serious issue.
That does kinda suck. :-\
> Before I ask the administrators to change their image, i.e. mount this partition under /work instead of /tmp, I'd like to ask if anyone is using OpenMPI on a state-less cluster, and are there any gotchas with regards to performance of OpenMPI, i.e. like handling of /tmp, that one would need to know?
I don't have much empirical information here -- I know that some people have done this (make /tmp be NFS-mounted). I think there are at least some issues with this, though -- many applications believe that a sufficient condition for uniqueness in /tmp is to simply append your PID to a filename. But this may no longer be true if /tmp is shared across multiple OS instances.
I don't have a specific case where this is problematic, but it's not a large stretch to imagine that this could happen in practice with random applications that make temp files in /tmp.
For corporate legal information go to:
users mailing list