Maybe an clarification of the SM BTL implementation is needed. Does the
SM BTL not set a limit based on np using the max allowable as a
ceiling? If not and all jobs are allowed to use up to max allowable I
see the reason for not wanting to raise the max allowable.
That being said it seems to me that the memory usage of the SM BTL is a
lot larger than it should be. Wasn't there some work done around June
that looked why the SM BTL was allocating a lot of memory, anything come
out of that?
Markus Daene wrote:
>I think it is not a good idea to increase the default value to 2G. You
>have to keep in mind that there are not so many people who have a
>machine with 128 and more cores on a single node. The average people
>will have nodes with 2,4 maybe 8 cores and therefore it is not necessary
>to set this parameter to such a high value. Eventually it allocates all
>of this memory per node, and if you have only 4 or 8G per node it will
>be inbalanced. For my 8core nodes I have even decreased the sm_max_size
>to 32G and I had no problems with that. As far as I know (if not
>otherwise specified during runtime) this parameter is global. So even if
>you run on your machine with 2 procs it might allocate the 2G for the
>MPI smp module.
>I would recommend like Richard suggests to set the parameter for your
>and not to change the default value.
>Rolf vandeVaart wrote:
>>We are running into a problem when running on one of our larger SMPs
>>using the latest Open MPI v1.2 branch. We are trying to run a job
>>with np=128 within a single node. We are seeing the following error:
>>"SM failed to send message due to shortage of shared memory."
>>We then increased the allowable maximum size of the shared segment to
>>2Gigabytes-1 which is the maximum allowed on 32-bit application. We
>>used the mca parameter to increase it as shown here.
>>-mca mpool_sm_max_size 2147483647
>>This allowed the program to run to completion. Therefore, we would
>>like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes.
>>Does anyone have an objection to this change? Soon we are going to
>>have larger CPU counts and would like to increase the odds that things
>>work "out of the box" on these large SMPs.
>>On a side note, I did a quick comparison of the shared memory needs of
>>the old Sun ClusterTools to Open MPI and came up with this table.
>> Open MPI
>>np Sun ClusterTools 6 current suggested
>> 2 20M 128M 128M
>> 4 20M 128M 128M
>> 8 22M 256M 256M
>> 16 27M 512M 512M
>> 32 48M 512M 1G
>> 64 133M 512M 2G-1
>>128 476M 512M 2G-1
>>devel mailing list
>devel mailing list