Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] SM backing file size
From: Brooks Davis (brooks_at_[hidden])
Date: 2008-11-15 17:37:26

On Sat, Nov 15, 2008 at 09:32:44AM -0800, Eugene Loh wrote:
> Ralph Castain wrote:
>> I probably wasn't clear - see below
>> On Nov 14, 2008, at 6:31 PM, Eugene Loh wrote:
>>> Ralph Castain wrote:
>>>> I have two examples so far:
>>>> 1. using a ramdisk, /tmp was set to 10MB. OMPI was run on a single
>>>> node, 2ppn, with btl=openib,sm,self. The program started, but
>>>> segfaulted on the first MPI_Send. No warnings were printed.
>>> Interesting. So far as I can tell, the actual memory consumption (total
>>> number of allocations in the mmapped segment) for 2 local processes
>>> should be a little more than half a Mbyte. The bulk of that would be
>>> from fragments (chunks). There are btl_sm_free_list_num=8 per process,
>>> each of btl_sm_max_frag_size=32K. So, that's 8x2x32K=512Kbyte.
>>> Actually, a little bit more. Anyhow, that accounts for most of the
>>> allocations, I think. Maybe if you're sending a lot of data, more gets
>>> allocated at MPI_Send time. I don't know.
>>> While only < 1 Mbyte is needed, however, mpool_sm_min_size=128M will be
>>> mapped.
>> Right - so then it sounds to me like this would fail (which it did) since
>> /tmp was fixed to 10M - and the mpool would be much too large given a
>> minimum size of 128M. Right?
> That makes sense to me.
> My analysis of how little of the mapped segment will actually be used is
> probably irrelevent.
> Here is what I think should happen:
> *) The lowest ranking process on the node opens and ftruncates the file.
> Since there isn't enough space, the ftruncate fails. This is in
> mca_common_sm_mmap_init() in ompi/mca/common/sm/common_sm_mmap.c.

On file systems that support holes (and thus overcommit), this won't
be sufficient. You need to actually write something to each block of
the file. A write of a single 0 to each 512-byte offset should do it
in practice. A write a byte, seek() block size, write() a byte, repeat
algorithm is a decent option and avoids the possibility of seg faults.
This will also avoid the pessimal block layout some file systems produce
with an ftruncate followed by random access.

-- Brooks

  • application/pgp-signature attachment: stored