Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Rolf.Vandevaart_at_[hidden]
Date: 2007-08-30 10:26:55


Li-Ta Lo wrote:

>On Tue, 2007-08-28 at 10:12 -0600, Brian Barrett wrote:
>
>
>>On Aug 28, 2007, at 9:05 AM, Li-Ta Lo wrote:
>>
>>
>>
>>>On Mon, 2007-08-27 at 15:10 -0400, Rolf vandeVaart wrote:
>>>
>>>
>>>>We are running into a problem when running on one of our larger SMPs
>>>>using the latest Open MPI v1.2 branch. We are trying to run a job
>>>>with np=128 within a single node. We are seeing the following error:
>>>>
>>>>"SM failed to send message due to shortage of shared memory."
>>>>
>>>>We then increased the allowable maximum size of the shared segment to
>>>>2Gigabytes-1 which is the maximum allowed on 32-bit application. We
>>>>used the mca parameter to increase it as shown here.
>>>>
>>>>-mca mpool_sm_max_size 2147483647
>>>>
>>>>This allowed the program to run to completion. Therefore, we would
>>>>like to increase the default maximum from 512Mbytes to 2G-1
>>>>Gigabytes.
>>>>Does anyone have an objection to this change? Soon we are going to
>>>>have larger CPU counts and would like to increase the odds that
>>>>things
>>>>work "out of the box" on these large SMPs.
>>>>
>>>>
>>>>
>>>There is a serious problem with the 1.2 branch, it does not allocate
>>>any SM area for each process at the beginning. SM areas are allocated
>>>on demand and if some of the processes are more aggressive than the
>>>others, it will cause starvation. This problem is fixed in the trunk
>>>by assign at least one SM area for each process. I think this is what
>>>you saw (starvation) and an increase of max size may not be necessary.
>>>
>>>
>>Although I'm pretty sure this is fixed in the v1.2 branch already.
>>
>>
>>
>
>It should never happen for the new code. The only way we can get the
>message is when MCA_BTL_SM_FIFO_WRITE return rc != OMPI_SUCCESS, but
>the new MCA_BTL_SM_FIFO_WRITE always return rc = OMPI_SUCCESS
>
>#define MCA_BTL_SM_FIFO_WRITE(endpoint_peer,
>my_smp_rank,peer_smp_rank,hdr,rc) \
>do { \
> ompi_fifo_t* fifo; \
> fifo=&(mca_btl_sm_component.fifo[peer_smp_rank][my_smp_rank]); \
> \
> /* thread lock */ \
> if(opal_using_threads()) \
> opal_atomic_lock(fifo->head_lock); \
> /* post fragment */ \
> while(ompi_fifo_write_to_head(hdr, fifo, \
> mca_btl_sm_component.sm_mpool) != OMPI_SUCCESS) \
> opal_progress(); \
> MCA_BTL_SM_SIGNAL_PEER(endpoint_peer); \
> rc=OMPI_SUCCESS; \
> if(opal_using_threads()) \
> opal_atomic_unlock(fifo->head_lock); \
>} while(0)
>
>Rolf, are you using the really last 1.2 branch?
>
>Ollie
>
>
>
Thanks for all the input. It turns out I was originally *not* using
the latest 1.2 branch. So, we redid the tests with the latest 1.2.
And, I am happy to report that we no longer get the"SM failed to
send message due to shortage of shared memory" error. However,
now the program hangs. So, it looks like we traded one problem for
another.

In the short term, I will just change the maximum memory in our
distribution using the openmpi-mca-params.conf file. In the long term,
we will try and track down better what is going on. It is not clear
to me that I can get my hands on the code. Perhaps I should also
try the trunk.

Rolf