Are the jobs that leave files behind terminating normally or aborting?
Are there any warnings/error messages out of mpirun?
Just trying to determine if this is an abnormal termination issue or a
bug in OMPI itself.
On Nov 19, 2008, at 8:05 AM, Ray Muno wrote:
> Thought I would revisit this one.
> We are still having issues with this. It is not clear to me what is
> leaving the user files behind in /dev/shm.
> This is not something users are doing directly, they are just
> compiling their code directly with mpif90 (from OpenMPI), using
> various compilers. Compilers in use are PGI, Intel, SunStudio and
> It looks like every job run leaves something behind in /dev/shm and
> it slowly fills up. We are just clearing these out at this point.
> Jeff Squyres wrote:
>> That is odd. Is your user's app crashing or being forcibly
>> killed? The ORTE daemon that is silently launched in v1.2 jobs
>> should ensure that files under /tmp/openmpi-sessions-
>> <userid>@<hostname> are removed.
>> On Nov 10, 2008, at 2:14 PM, Ray Muno wrote:
>>> Brock Palen wrote:
>>>> on most systems /dev/shm is limited to half the physical ram.
>>>> Was the user someone filling up /dev/shm so there was no space?
>>> The problem is there is a large collection of stale files left in
>>> there by the users that have run on that node (Rocks based cluster).
>>> I am trying to determine why they are left behind.
> Ray Muno
> University of Minnesota
> Aerospace Engineering and Mechanics
> users mailing list