I think it is not a good idea to increase the default value to 2G. You
have to keep in mind that there are not so many people who have a
machine with 128 and more cores on a single node. The average people
will have nodes with 2,4 maybe 8 cores and therefore it is not necessary
to set this parameter to such a high value. Eventually it allocates all
of this memory per node, and if you have only 4 or 8G per node it will
be inbalanced. For my 8core nodes I have even decreased the sm_max_size
to 32G and I had no problems with that. As far as I know (if not
otherwise specified during runtime) this parameter is global. So even if
you run on your machine with 2 procs it might allocate the 2G for the
MPI smp module.
I would recommend like Richard suggests to set the parameter for your
and not to change the default value.
Rolf vandeVaart wrote:
> We are running into a problem when running on one of our larger SMPs
> using the latest Open MPI v1.2 branch. We are trying to run a job
> with np=128 within a single node. We are seeing the following error:
> "SM failed to send message due to shortage of shared memory."
> We then increased the allowable maximum size of the shared segment to
> 2Gigabytes-1 which is the maximum allowed on 32-bit application. We
> used the mca parameter to increase it as shown here.
> -mca mpool_sm_max_size 2147483647
> This allowed the program to run to completion. Therefore, we would
> like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes.
> Does anyone have an objection to this change? Soon we are going to
> have larger CPU counts and would like to increase the odds that things
> work "out of the box" on these large SMPs.
> On a side note, I did a quick comparison of the shared memory needs of
> the old Sun ClusterTools to Open MPI and came up with this table.
> Open MPI
> np Sun ClusterTools 6 current suggested
> 2 20M 128M 128M
> 4 20M 128M 128M
> 8 22M 256M 256M
> 16 27M 512M 512M
> 32 48M 512M 1G
> 64 133M 512M 2G-1
> 128 476M 512M 2G-1
> devel mailing list