Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] SM backing file size
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-11-14 11:15:39

On Nov 14, 2008, at 10:56 AM, Eugene Loh wrote:

>> I too am interested - I think we need to do something about the sm
>> backing file situation as larger core machines are slated to
>> become more prevalent shortly.
> I think there is at least one piece of low-flying fruit: get rid of
> a lot of the page alignments. Especially as one goes to large core
> counts, the O(n^2) number of local "connections" becomes important,
> and each connection starts with three page-aligned allocations, each
> allocation very tiny (and hence uses only a tiny portion of the page
> + that is allocated to it). So, most of the allocated memory is
> never used.
> Personally, I question the rationale for the page alignment in the
> first place, but don't mind listening to anyone who wants to explain
> it to me. Presumably, in a NUMA machine, localizing FIFOs to
> separate physical memory improves performance. I get that basic
> premise. I just question the reasoning beyond that.

I think the original rationale was that only pages could be physically
pinned (not cache lines).

Slightly modifying Eugene's low-hanging fruit might be to figure out
which processes are local to each other (e.g., on cores on the same
socket) where memory local to all the cores on a socket. These
processes' data could be shared contiguously (perhaps even within a
single page, depending on how many cores are there) instead of on
individual pages. Specifically: use page alignments for groups of
processes that have the same memory locality.

> The page alignment appears in ompi_fifo_init and ompi_cb_fifo_init.
> It comes additionally from mca_mpool_sm_alloc. Four minor changes
> could change alignment from page to cacheline size.
>> what happens when there isn't enough memory to support all this?
>> Are we smart enough to detect this situation? Does the sm
>> subsystem quietly shut down? Warn and shut down? Segfault?
> I'm not exactly sure. I think it's a combination of three things:
> *) some attempt to signal problems correctly
> *) some degree just to live with less shared memory (possibly
> leading to performance degradation)
> *) poorly tested in any case
>> I have two examples so far:
>> 1. using a ramdisk, /tmp was set to 10MB. OMPI was run on a single
>> node, 2ppn, with btl=openib,sm,self. The program started, but
>> segfaulted on the first MPI_Send. No warnings were printed.
>> 2. again with a ramdisk, /tmp was reportedly set to 16MB
>> (unverified - some uncertainty, could be have been much larger).
>> OMPI was run on multiple nodes, 16ppn, with btl=openib,sm,self.
>> The program ran to completion without errors or warning. I don't
>> know the communication pattern - could be no local comm was
>> performed, though that sounds doubtful.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]

Jeff Squyres
Cisco Systems