Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage
From: David Turner (dpturner_at_[hidden])
Date: 2011-11-04 13:51:51


I should have been more careful. When we first started using OpenMPI,
version 1.4.1, there was a bug that caused session directories to be
left behind. This was fixed in subsequent releases (and via a patch
for 1.4.1).

Our batch epilogue still removes everything in /tmp that belongs to the
owner of the batch job. It is invoked after the user's application has
terminated, so the session directories are already gone by that time.

Sorry for the confusion!

On 11/4/11 3:43 AM, TERRY DONTJE wrote:
> David, are you saying your jobs consistently leave behind session files
> after the job exits? It really shouldn't even in the case when a job
> aborts, I thought, mpirun took great pains to cleanup after itself. Can
> you tell us what version of OMPI you are running with? I think I could
> see kill -9 of mpirun and processes below would cause turds to be left
> behind.
>
> --td
>
> On 11/4/2011 2:37 AM, David Turner wrote:
>> % df /tmp
>> Filesystem 1K-blocks Used Available Use% Mounted on
>> - 12330084 822848 11507236 7% /
>> % df /
>> Filesystem 1K-blocks Used Available Use% Mounted on
>> - 12330084 822848 11507236 7% /
>>
>> That works out to 11GB. But...
>>
>> The compute nodes have 24GB. Freshly booted, about 3.2GB is
>> consumed by the kernel, various services, and the root file system.
>> At this time, usage of /tmp is essentially nil.
>>
>> We set user memory limits to 20GB.
>>
>> I would imagine that the size of the session directories depends on a
>> number of factors; perhaps the developers can comment on that. I have
>> only seen total sizes in the 10s of MBs on our 8-node, 24GB nodes.
>>
>> As long as they're removed after each job, they don't really compete
>> with the application for available memory.
>>
>> On 11/3/11 8:40 PM, Ed Blosch wrote:
>>> Thanks very much, exactly what I wanted to hear. How big is /tmp?
>>>
>>> -----Original Message-----
>>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
>>> Behalf Of David Turner
>>> Sent: Thursday, November 03, 2011 6:36 PM
>>> To: users_at_[hidden]
>>> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node
>>> /tmp
>>> for OpenMPI usage
>>>
>>> I'm not a systems guy, but I'll pitch in anyway. On our cluster,
>>> all the compute nodes are completely diskless. The root file system,
>>> including /tmp, resides in memory (ramdisk). OpenMPI puts these
>>> session directories therein. All our jobs run through a batch
>>> system (torque). At the conclusion of each batch job, an epilogue
>>> process runs that removes all files belonging to the owner of the
>>> current batch job from /tmp (and also looks for and kills orphan
>>> processes belonging to the user). This epilogue had to written
>>> by our systems staff.
>>>
>>> I believe this is a fairly common configuration for diskless
>>> clusters.
>>>
>>> On 11/3/11 4:09 PM, Blosch, Edwin L wrote:
>>>> Thanks for the help. A couple follow-up-questions, maybe this starts to
>>> go outside OpenMPI:
>>>>
>>>> What's wrong with using /dev/shm? I think you said earlier in this
>>>> thread
>>> that this was not a safe place.
>>>>
>>>> If the NFS-mount point is moved from /tmp to /work, would a /tmp
>>>> magically
>>> appear in the filesystem for a stateless node? How big would it be,
>>> given
>>> that there is no local disk, right? That may be something I have to
>>> ask the
>>> vendor, which I've tried, but they don't quite seem to get the question.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
>>> Behalf Of Ralph Castain
>>>> Sent: Thursday, November 03, 2011 5:22 PM
>>>> To: Open MPI Users
>>>> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less
>>>> node /tmp
>>> for OpenMPI usage
>>>>
>>>>
>>>> On Nov 3, 2011, at 2:55 PM, Blosch, Edwin L wrote:
>>>>
>>>>> I might be missing something here. Is there a side-effect or
>>>>> performance
>>> loss if you don't use the sm btl? Why would it exist if there is a
>>> wholly
>>> equivalent alternative? What happens to traffic that is intended for
>>> another process on the same node?
>>>>
>>>> There is a definite performance impact, and we wouldn't recommend doing
>>> what Eugene suggested if you care about performance.
>>>>
>>>> The correct solution here is get your sys admin to make /tmp local.
>>>> Making
>>> /tmp NFS mounted across multiple nodes is a major "faux pas" in the
>>> Linux
>>> world - it should never be done, for the reasons stated by Jeff.
>>>>
>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: users-bounces_at_[hidden]
>>>>> [mailto:users-bounces_at_[hidden]] On
>>> Behalf Of Eugene Loh
>>>>> Sent: Thursday, November 03, 2011 1:23 PM
>>>>> To: users_at_[hidden]
>>>>> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node
>>> /tmp for OpenMPI usage
>>>>>
>>>>> Right. Actually "--mca btl ^sm". (Was missing "btl".)
>>>>>
>>>>> On 11/3/2011 11:19 AM, Blosch, Edwin L wrote:
>>>>>> I don't tell OpenMPI what BTLs to use. The default uses sm and puts a
>>> session file on /tmp, which is NFS-mounted and thus not a good choice.
>>>>>>
>>>>>> Are you suggesting something like --mca ^sm?
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: users-bounces_at_[hidden]
>>>>>> [mailto:users-bounces_at_[hidden]] On
>>> Behalf Of Eugene Loh
>>>>>> Sent: Thursday, November 03, 2011 12:54 PM
>>>>>> To: users_at_[hidden]
>>>>>> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node
>>> /tmp for OpenMPI usage
>>>>>>
>>>>>> I've not been following closely. Why must one use shared-memory
>>>>>> communications? How about using other BTLs in a "loopback" fashion?
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>>
>
> --
> Oracle
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle *- Performance Technologies*
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Best regards,
David Turner
User Services Group        email: dpturner_at_[hidden]
NERSC Division             phone: (510) 486-4027
Lawrence Berkeley Lab        fax: (510) 486-4316