Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] SM backing file size
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-11-14 09:14:59


On Nov 14, 2008, at 7:00 AM, Tim Mattox wrote:

> Ralph,
> Are these systems running Linux? If so, the long term solution is to
> finish ticket #1320:
> https://svn.open-mpi.org/trac/ompi/ticket/1320
> Which would eliminate the sm backing files entierly, without needing
> to reduce the size of the shared memory that is used. For systems
> where /tmp is a ramdisk, the current scheme is very wasteful (less
> so if you are using tmpfs).

I agree - I think this needs to be bumped up in priority. I'm willing
to help, if that would be useful

>
>
> What kind of ramdisk are you using? If you are not using tmpfs,
> you should consider switching to tmpfs, since it allows you to have
> an arbitrarily large /tmp, yet only uses as much RAM as there
> are files in /tmp. See this for a good howto/intro:
> http://www.ibm.com/developerworks/library/l-fs3.html

I honestly don't know, and have no control over how it is setup...nor
any influence in that regard! :-)

>
>
> On Fri, Nov 14, 2008 at 8:42 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>> Hi Eugene
>>
>> I too am interested - I think we need to do something about the sm
>> backing
>> file situation as larger core machines are slated to become more
>> prevalent
>> shortly.
>>
>> I appreciate your info on the sizes and controls. One other
>> question: what
>> happens when there isn't enough memory to support all this? Are we
>> smart
>> enough to detect this situation? Does the sm subsystem quietly shut
>> down?
>> Warn and shut down? Segfault?
>>
>> I have two examples so far:
>>
>> 1. using a ramdisk, /tmp was set to 10MB. OMPI was run on a single
>> node,
>> 2ppn, with btl=openib,sm,self. The program started, but segfaulted
>> on the
>> first MPI_Send. No warnings were printed.
>>
>> 2. again with a ramdisk, /tmp was reportedly set to 16MB
>> (unverified - some
>> uncertainty, could be have been much larger). OMPI was run on
>> multiple
>> nodes, 16ppn, with btl=openib,sm,self. The program ran to
>> completion without
>> errors or warning. I don't know the communication pattern - could
>> be no
>> local comm was performed, though that sounds doubtful.
>>
>> If someone doesn't know, I'll have to dig into the code and figure
>> out the
>> response - just hoping that someone can spare me the pain.
>>
>> Thanks
>> Ralph
>>
>>
>> On Nov 13, 2008, at 3:21 PM, Eugene Loh wrote:
>>
>>> Ralph Castain wrote:
>>>
>>>> As has frequently been commented upon at one time or another,
>>>> the shared
>>>> memory backing file can be quite huge. There used to be a param
>>>> for
>>>> controlling this size, but I can't find it in 1.3 - or at least,
>>>> the name
>>>> or method for controlling file size has morphed into something I
>>>> don't
>>>> recognize.
>>>>
>>>> Can someone more familiar with that subsystem point me to one or
>>>> more
>>>> params that will allow us to control the size of that file? It
>>>> is swamping
>>>> our systems and causing OMPI to segfault.
>>>
>>> Sounds like you've already gotten your answers, but I'll add my
>>> $0.02
>>> anyhow.
>>>
>>> The file size is the number of local processes (call it n) times
>>> mpool_sm_per_peer_size (default 32M), but with a minimum of
>>> mpool_sm_min_size (default 128M) and a maximum of
>>> mpool_sm_max_size (default
>>> 2G? 256M?). So, you can tweak those parameters to control file
>>> size.
>>>
>>> Another issue is possibly how small a backing file you can get
>>> away with.
>>> That is, just forcing the file to be smaller may not be enough
>>> since your
>>> job may no longer run. The backing file seems to be used mainly by:
>>>
>>> *) eager-fragment free lists: We start with enough eager
>>> fragments so
>>> that we could have two per connection. So, you could bump the sm
>>> eager size
>>> down if you need to shoehorn a job into a very small backing file.
>>>
>>> *) large-fragment free lists: We start with 8*n large fragments.
>>> If this
>>> term plagues you, you can bump the sm chunk size down or reduce
>>> the value of
>>> 8 (using btl_sm_free_list_num, I think).
>>>
>>> *) FIFOs: The code tries to align a number of things on pagesize
>>> boundaries, so you end up with about 3*n*n*pagesize overhead
>>> here. If this
>>> term is causing you problems, you're stuck (unless you modify OMPI).
>>>
>>> I'm interested in this subject! :^)
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>
> --
> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
> tmattox_at_[hidden] || timattox_at_[hidden]
> I'm a bright... http://www.the-brights.net/
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel