Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] allocating sm memory with page alignment
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2008-08-29 20:52:10

(I'm new to Open MPI.)

I'm looking at the sm BTL.

In mca_btl_sm_add_procs(), there's a loop over peer processes, with a
call to ompi_fifo_init(). That is, one call to ompi_fifo_init() for
each connection (sender/receiver pair).

In ompi_fifo_init(), there's an allocation of
sizeof(ompi_cb_fifo_wrapper_t), and a call to ompi_cb_fifo_init(), which
in turn has two allocations: one of a bunch of pointers and another of

In short, for each connection, there are three allocations:

*) sizeof(ompi_cb_fifo_wrapper_t)... about 64 bytes on LP64
*) a bunch of pointers... about 1 Kbyte on LP64
*) sizeof(ompi_cb_fifo_ctl_t)... about 12 bytes

Let me say this yet another way. For N local processes, there are
N*(N-1) per-connection allocations, most of which are 64 bytes or smaller.

BUT, in ompi_fifo_init() and ompi_cb_fifo_init(), we ask for page
alignment of each allocation. Further, in mca_mpool_sm_alloc() that
alignment is further reinforced to be on page boundaries.

As the number of local processes increases, therefore these
per-connection allocations become very costly. For 8K pages, for
example, and 100 on-node processes, we're talking 3*100*100*8K = 240
Mbytes. For 512 on-node processes (yes, we have nodes this big), that's
6 Gbyte... most of which is unused. (E.g., allocating more than an 8K
page when we only need 64 or 12 bytes.)

Okay, long intro. Let me start with a short question: do we really
need page alignment for these allocations? Would cacheline alignment be

(I imagine I'll have follow-up questions once the answers start to roll in.)