Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] allocating sm memory with page alignment
From: Graham, Richard L. (rlgraham_at_[hidden])
Date: 2008-08-30 12:55:27

I have not looked at the code in a long time, so not sure how many things have changed ... In general what you are suggesting is reasonable. However, especially on large machines you also need to worry about memory locality, so should allocate from memory pools that are appropriately located. I expect that memory allocated on a per-socket basis would do. Having said that, I have no clue how easy this is to implement within the current code base, but expect you can rely on first-touch after the procs are locked down to simplify the implementation.


----- Original Message -----
From: devel-bounces_at_[hidden] <devel-bounces_at_[hidden]>
To: devel_at_[hidden] <devel_at_[hidden]>
Sent: Fri Aug 29 20:52:10 2008
Subject: [OMPI devel] allocating sm memory with page alignment

(I'm new to Open MPI.)

I'm looking at the sm BTL.

In mca_btl_sm_add_procs(), there's a loop over peer processes, with a
call to ompi_fifo_init(). That is, one call to ompi_fifo_init() for
each connection (sender/receiver pair).

In ompi_fifo_init(), there's an allocation of
sizeof(ompi_cb_fifo_wrapper_t), and a call to ompi_cb_fifo_init(), which
in turn has two allocations: one of a bunch of pointers and another of

In short, for each connection, there are three allocations:

*) sizeof(ompi_cb_fifo_wrapper_t)... about 64 bytes on LP64
*) a bunch of pointers... about 1 Kbyte on LP64
*) sizeof(ompi_cb_fifo_ctl_t)... about 12 bytes

Let me say this yet another way. For N local processes, there are
N*(N-1) per-connection allocations, most of which are 64 bytes or smaller.

BUT, in ompi_fifo_init() and ompi_cb_fifo_init(), we ask for page
alignment of each allocation. Further, in mca_mpool_sm_alloc() that
alignment is further reinforced to be on page boundaries.

As the number of local processes increases, therefore these
per-connection allocations become very costly. For 8K pages, for
example, and 100 on-node processes, we're talking 3*100*100*8K = 240
Mbytes. For 512 on-node processes (yes, we have nodes this big), that's
6 Gbyte... most of which is unused. (E.g., allocating more than an 8K
page when we only need 64 or 12 bytes.)

Okay, long intro. Let me start with a short question: do we really
need page alignment for these allocations? Would cacheline alignment be

(I imagine I'll have follow-up questions once the answers start to roll in.)
devel mailing list