Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Markus Daene (markus.daene_at_[hidden])
Date: 2007-08-28 03:45:03


Rolf,

I think it is not a good idea to increase the default value to 2G. You
have to keep in mind that there are not so many people who have a
machine with 128 and more cores on a single node. The average people
will have nodes with 2,4 maybe 8 cores and therefore it is not necessary
to set this parameter to such a high value. Eventually it allocates all
of this memory per node, and if you have only 4 or 8G per node it will
be inbalanced. For my 8core nodes I have even decreased the sm_max_size
to 32G and I had no problems with that. As far as I know (if not
otherwise specified during runtime) this parameter is global. So even if
you run on your machine with 2 procs it might allocate the 2G for the
MPI smp module.
I would recommend like Richard suggests to set the parameter for your
machine in
etc/openmpi-mca-params.conf
and not to change the default value.

Markus

Rolf vandeVaart wrote:
> We are running into a problem when running on one of our larger SMPs
> using the latest Open MPI v1.2 branch. We are trying to run a job
> with np=128 within a single node. We are seeing the following error:
>
> "SM failed to send message due to shortage of shared memory."
>
> We then increased the allowable maximum size of the shared segment to
> 2Gigabytes-1 which is the maximum allowed on 32-bit application. We
> used the mca parameter to increase it as shown here.
>
> -mca mpool_sm_max_size 2147483647
>
> This allowed the program to run to completion. Therefore, we would
> like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes.
> Does anyone have an objection to this change? Soon we are going to
> have larger CPU counts and would like to increase the odds that things
> work "out of the box" on these large SMPs.
>
> On a side note, I did a quick comparison of the shared memory needs of
> the old Sun ClusterTools to Open MPI and came up with this table.
>
> Open MPI
> np Sun ClusterTools 6 current suggested
> -----------------------------------------------------------------
> 2 20M 128M 128M
> 4 20M 128M 128M
> 8 22M 256M 256M
> 16 27M 512M 512M
> 32 48M 512M 1G
> 64 133M 512M 2G-1
> 128 476M 512M 2G-1
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>