Sylvain Jeaugey wrote:
> On Thu, 10 Jun 2010, Paul H. Hargrove wrote:
>> As for why mmap is slower. When the file is on a real (not tmpfs or
>> other ramdisk) I am 95% certain that this is an artifact of the Linux
>> swapper/pager behavior which is thinking it is being smart by
>> "swapping ahead". Even when there is no memory pressure that
>> requires swapping, Linux starts queuing swap I/O for pages to keep
>> the number of "clean" pages up when possible. This results in pages
>> of the shared memory file being written out to the actual block
>> device. Both the background I/O and the VM metadata updates
>> contribute to the lost time. I say 95% certain because I have a
>> colleague who looked into this phenomena in another setting and I am
>> recounting what he reported as clearly as I can remember, but might
>> have misunderstood or inserted my own speculation by accident. A
>> sufficiently motivated investigator (not me) could probably devise an
>> experiment to verify this.
> Interesting. Do you think this behavior of the linux kernel would
> change if the file was unlink()ed after attach ?
As Jeff pointed out, the file IS unlinked by Open MPI, presumably to
ensure it is not left behind in case of abnormal termination.
This was also the case for the scenario I reported my colleague looking
at. We were (unpleasantly) surprised to find that this "swap ahead"
behavior was being applied to an unlinked file : a case that would
appear to be a very simple one to optimize away. However, the simple
fact is that Linux appears just to queue I/O to the "backing store" for
a page regardless of little details like it being unlinked.
Paul H. Hargrove PHHargrove_at_[hidden]
Future Technologies Group
HPC Research Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900