There are 3 parameters that control how much memory is used by the SM BTL.
MCA mpool: parameter "mpool_sm_max_size" (current value: "536870912")
Maximum size of the sm mpool shared memory file
MCA mpool: parameter "mpool_sm_min_size" (current value: "134217728")
Minimum size of the sm mpool shared memory file
MCA mpool: parameter "mpool_sm_per_peer_size" (current value: "33554432")
Size (in bytes) to allocate per local peer in the sm mpool
shared memory file,
bounded by min_size and max_size
To paraphrase the above, the default ceiling is 512M, the default floor
and the scaling factor is 32M*procs_on_node. Therefore, changing it
effect cases where there were more than 16 processes on a node. (16*32=512M)
My suggestion was to increase the ceiling from 512M to 2G-1. And yes, we
could adjust as suggested by Rich by adding setting the parameter in
our customized openmpi-mca-params.conf file. I just was not sure
that was the optimal solution.
Terry D. Dontje wrote:
>Maybe an clarification of the SM BTL implementation is needed. Does the
>SM BTL not set a limit based on np using the max allowable as a
>ceiling? If not and all jobs are allowed to use up to max allowable I
>see the reason for not wanting to raise the max allowable.
>That being said it seems to me that the memory usage of the SM BTL is a
>lot larger than it should be. Wasn't there some work done around June
>that looked why the SM BTL was allocating a lot of memory, anything come
>out of that?
>Markus Daene wrote:
>>I think it is not a good idea to increase the default value to 2G. You
>>have to keep in mind that there are not so many people who have a
>>machine with 128 and more cores on a single node. The average people
>>will have nodes with 2,4 maybe 8 cores and therefore it is not necessary
>>to set this parameter to such a high value. Eventually it allocates all
>>of this memory per node, and if you have only 4 or 8G per node it will
>>be inbalanced. For my 8core nodes I have even decreased the sm_max_size
>>to 32G and I had no problems with that. As far as I know (if not
>>otherwise specified during runtime) this parameter is global. So even if
>>you run on your machine with 2 procs it might allocate the 2G for the
>>MPI smp module.
>>I would recommend like Richard suggests to set the parameter for your
>>and not to change the default value.
>>Rolf vandeVaart wrote:
>>>We are running into a problem when running on one of our larger SMPs
>>>using the latest Open MPI v1.2 branch. We are trying to run a job
>>>with np=128 within a single node. We are seeing the following error:
>>>"SM failed to send message due to shortage of shared memory."
>>>We then increased the allowable maximum size of the shared segment to
>>>2Gigabytes-1 which is the maximum allowed on 32-bit application. We
>>>used the mca parameter to increase it as shown here.
>>>-mca mpool_sm_max_size 2147483647
>>>This allowed the program to run to completion. Therefore, we would
>>>like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes.
>>>Does anyone have an objection to this change? Soon we are going to
>>>have larger CPU counts and would like to increase the odds that things
>>>work "out of the box" on these large SMPs.
>>>On a side note, I did a quick comparison of the shared memory needs of
>>>the old Sun ClusterTools to Open MPI and came up with this table.
>>> Open MPI
>>>np Sun ClusterTools 6 current suggested
>>> 2 20M 128M 128M
>>> 4 20M 128M 128M
>>> 8 22M 256M 256M
>>>16 27M 512M 512M
>>>32 48M 512M 1G
>>>64 133M 512M 2G-1
>>>128 476M 512M 2G-1
>>>devel mailing list
>>devel mailing list
>devel mailing list