Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage
From: TERRY DONTJE (terry.dontje_at_[hidden])
Date: 2011-11-04 06:43:41


David, are you saying your jobs consistently leave behind session files
after the job exits? It really shouldn't even in the case when a job
aborts, I thought, mpirun took great pains to cleanup after itself.
Can you tell us what version of OMPI you are running with? I think I
could see kill -9 of mpirun and processes below would cause turds to be
left behind.

--td

On 11/4/2011 2:37 AM, David Turner wrote:
> % df /tmp
> Filesystem 1K-blocks Used Available Use% Mounted on
> - 12330084 822848 11507236 7% /
> % df /
> Filesystem 1K-blocks Used Available Use% Mounted on
> - 12330084 822848 11507236 7% /
>
> That works out to 11GB. But...
>
> The compute nodes have 24GB. Freshly booted, about 3.2GB is
> consumed by the kernel, various services, and the root file system.
> At this time, usage of /tmp is essentially nil.
>
> We set user memory limits to 20GB.
>
> I would imagine that the size of the session directories depends on a
> number of factors; perhaps the developers can comment on that. I have
> only seen total sizes in the 10s of MBs on our 8-node, 24GB nodes.
>
> As long as they're removed after each job, they don't really compete
> with the application for available memory.
>
> On 11/3/11 8:40 PM, Ed Blosch wrote:
>> Thanks very much, exactly what I wanted to hear. How big is /tmp?
>>
>> -----Original Message-----
>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
>> Behalf Of David Turner
>> Sent: Thursday, November 03, 2011 6:36 PM
>> To: users_at_[hidden]
>> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node
>> /tmp
>> for OpenMPI usage
>>
>> I'm not a systems guy, but I'll pitch in anyway. On our cluster,
>> all the compute nodes are completely diskless. The root file system,
>> including /tmp, resides in memory (ramdisk). OpenMPI puts these
>> session directories therein. All our jobs run through a batch
>> system (torque). At the conclusion of each batch job, an epilogue
>> process runs that removes all files belonging to the owner of the
>> current batch job from /tmp (and also looks for and kills orphan
>> processes belonging to the user). This epilogue had to written
>> by our systems staff.
>>
>> I believe this is a fairly common configuration for diskless
>> clusters.
>>
>> On 11/3/11 4:09 PM, Blosch, Edwin L wrote:
>>> Thanks for the help. A couple follow-up-questions, maybe this
>>> starts to
>> go outside OpenMPI:
>>>
>>> What's wrong with using /dev/shm? I think you said earlier in this
>>> thread
>> that this was not a safe place.
>>>
>>> If the NFS-mount point is moved from /tmp to /work, would a /tmp
>>> magically
>> appear in the filesystem for a stateless node? How big would it be,
>> given
>> that there is no local disk, right? That may be something I have to
>> ask the
>> vendor, which I've tried, but they don't quite seem to get the question.
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
>> Behalf Of Ralph Castain
>>> Sent: Thursday, November 03, 2011 5:22 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less
>>> node /tmp
>> for OpenMPI usage
>>>
>>>
>>> On Nov 3, 2011, at 2:55 PM, Blosch, Edwin L wrote:
>>>
>>>> I might be missing something here. Is there a side-effect or
>>>> performance
>> loss if you don't use the sm btl? Why would it exist if there is a
>> wholly
>> equivalent alternative? What happens to traffic that is intended for
>> another process on the same node?
>>>
>>> There is a definite performance impact, and we wouldn't recommend doing
>> what Eugene suggested if you care about performance.
>>>
>>> The correct solution here is get your sys admin to make /tmp local.
>>> Making
>> /tmp NFS mounted across multiple nodes is a major "faux pas" in the
>> Linux
>> world - it should never be done, for the reasons stated by Jeff.
>>>
>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: users-bounces_at_[hidden]
>>>> [mailto:users-bounces_at_[hidden]] On
>> Behalf Of Eugene Loh
>>>> Sent: Thursday, November 03, 2011 1:23 PM
>>>> To: users_at_[hidden]
>>>> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node
>> /tmp for OpenMPI usage
>>>>
>>>> Right. Actually "--mca btl ^sm". (Was missing "btl".)
>>>>
>>>> On 11/3/2011 11:19 AM, Blosch, Edwin L wrote:
>>>>> I don't tell OpenMPI what BTLs to use. The default uses sm and puts a
>> session file on /tmp, which is NFS-mounted and thus not a good choice.
>>>>>
>>>>> Are you suggesting something like --mca ^sm?
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: users-bounces_at_[hidden]
>>>>> [mailto:users-bounces_at_[hidden]] On
>> Behalf Of Eugene Loh
>>>>> Sent: Thursday, November 03, 2011 12:54 PM
>>>>> To: users_at_[hidden]
>>>>> Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node
>> /tmp for OpenMPI usage
>>>>>
>>>>> I've not been following closely. Why must one use shared-memory
>>>>> communications? How about using other BTLs in a "loopback" fashion?
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
>

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture