Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Li-Ta Lo (ollie_at_[hidden])
Date: 2007-08-28 14:45:23


On Tue, 2007-08-28 at 10:12 -0600, Brian Barrett wrote:
> On Aug 28, 2007, at 9:05 AM, Li-Ta Lo wrote:
>
> > On Mon, 2007-08-27 at 15:10 -0400, Rolf vandeVaart wrote:
> >> We are running into a problem when running on one of our larger SMPs
> >> using the latest Open MPI v1.2 branch. We are trying to run a job
> >> with np=128 within a single node. We are seeing the following error:
> >>
> >> "SM failed to send message due to shortage of shared memory."
> >>
> >> We then increased the allowable maximum size of the shared segment to
> >> 2Gigabytes-1 which is the maximum allowed on 32-bit application. We
> >> used the mca parameter to increase it as shown here.
> >>
> >> -mca mpool_sm_max_size 2147483647
> >>
> >> This allowed the program to run to completion. Therefore, we would
> >> like to increase the default maximum from 512Mbytes to 2G-1
> >> Gigabytes.
> >> Does anyone have an objection to this change? Soon we are going to
> >> have larger CPU counts and would like to increase the odds that
> >> things
> >> work "out of the box" on these large SMPs.
> >>
> >
> >
> > There is a serious problem with the 1.2 branch, it does not allocate
> > any SM area for each process at the beginning. SM areas are allocated
> > on demand and if some of the processes are more aggressive than the
> > others, it will cause starvation. This problem is fixed in the trunk
> > by assign at least one SM area for each process. I think this is what
> > you saw (starvation) and an increase of max size may not be necessary.
>
> Although I'm pretty sure this is fixed in the v1.2 branch already.
>

It should never happen for the new code. The only way we can get the
message is when MCA_BTL_SM_FIFO_WRITE return rc != OMPI_SUCCESS, but
the new MCA_BTL_SM_FIFO_WRITE always return rc = OMPI_SUCCESS

#define MCA_BTL_SM_FIFO_WRITE(endpoint_peer,
my_smp_rank,peer_smp_rank,hdr,rc) \
do { \
    ompi_fifo_t* fifo; \
    fifo=&(mca_btl_sm_component.fifo[peer_smp_rank][my_smp_rank]); \
 \
    /* thread lock */ \
    if(opal_using_threads()) \
        opal_atomic_lock(fifo->head_lock); \
    /* post fragment */ \
    while(ompi_fifo_write_to_head(hdr, fifo, \
        mca_btl_sm_component.sm_mpool) != OMPI_SUCCESS) \
        opal_progress(); \
    MCA_BTL_SM_SIGNAL_PEER(endpoint_peer); \
    rc=OMPI_SUCCESS; \
    if(opal_using_threads()) \
        opal_atomic_unlock(fifo->head_lock); \
} while(0)

Rolf, are you using the really last 1.2 branch?

Ollie