Open MPI Development Mailing List Archives

 |   Home   |   Support   |   FAQ   |   all Development mailing list
 Subject: Re: [OMPI devel] Reviewing MPI_Dims_create From: Andreas Schäfer (gentryx_at_[hidden]) Date: 2014-02-11 06:16:53 Hi, ah, that's clever indeed! Best -Andreas On 12:02 Tue 11 Feb , Christoph Niethammer wrote: > Hello, > > After rethinking Jeff's comments about caching prime numbers I came to the conclusion that we can omit the prime numbers at all and go directly for the factorization. :D > We then only need at most log_2(INT_MAX) * sizeof(int) = 32 * 4 byte = 128 byte of memory for the factors. > > Computational costs reduce as well as the factorization itself is done by a loop with at most \sqrt(num) / 2 iterations - which is the same as in the original prime number detection loop. > I think this is the cleanest way which reduces also source code size. ;) > > Find attache patch against the trunk. > > Best regards > Christoph > > -- > > Christoph Niethammer > High Performance Computing Center Stuttgart (HLRS) > Nobelstrasse 19 > 70569 Stuttgart > > Tel: ++49(0)711-685-87203 > email: niethammer_at_[hidden] > > > > ----- Ursprüngliche Mail ----- > Von: "Andreas Schäfer" > An: "Open MPI Developers" > Gesendet: Dienstag, 11. Februar 2014 06:24:56 > Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create > > OK, so we were thinking the same thing :-) The optimization you > mention below is the same I've used in my updated patch. > > > On 02:29 Tue 11 Feb , Christoph Niethammer wrote: > > As mentioned in my former mail I did not touch the factorization > > code. > > But to figure out if a number n is *not* a prime number it is sufficient to check up to \sqrt(n). > > Proof: > > let n = p*q with q > \sqrt{n} > > --> p < \sqrt(n) > > So we have already found factor p before reaching \sqrt(n) and by this n is no prime any more and no need for further checks. ;) > > > > > > The mentioned factorization may indeed include one factor which is larger than \sqrt(n). :) > > > > Proof that at least one prime factor can be larger than \sqrt(n) example: > > 6 = 2*3 > > \sqrt(6) = 2.4494897427832... < 3 Q.E.D. > > > > > > Proof that no more than one factor can be larger than \sqrt(n): > > let n = \prod_{i=0}^K p_i with p_i \in N and K > 2 > > and assume w.l.o.g. p_0 > \sqrt(n) and p_1 > \sqrt(n) > > --> 1 > \prod_{i=2}^K p_i > > which is a contradiction as all p_i \in N. Q.E.D. > > > > > > So your idea is still applicable with not much effort and we only need prime factors up to sqrt(n) in the factorizer code for an additional optimization. :) > > > > First search all K' factors p_i < \sqrt(n). If then n \ne \prod_{i=0}^{K'} p_i we should be sure that p_{K'+1} = n / \prod_{i=0}^{K'} p_i is a prime. No complication with counts IMHO. I leave this without patch as it is already 2:30 in the morning. :P > > > > Regards > > Christoph > > > > -- > > > > Christoph Niethammer > > High Performance Computing Center Stuttgart (HLRS) > > Nobelstrasse 19 > > 70569 Stuttgart > > > > Tel: ++49(0)711-685-87203 > > email: niethammer_at_[hidden] > > > > ----- Ursprüngliche Mail ----- > > Von: "Andreas Schäfer" > > An: "Open MPI Developers" > > Gesendet: Montag, 10. Februar 2014 23:24:24 > > Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create > > > > Christoph- > > > > your patch has the same problem as my original patch: indeed there may > > be a prime factor p of n with p > sqrt(n). What's important is that > > there may only be at most one. I've submitted an updated patch (see my > > previous mail) which catches this special case. > > > > Best > > -Andreas > > > > > > On 19:30 Mon 10 Feb , Christoph Niethammer wrote: > > > Hello, > > > > > > I noticed some effort in improving the scalability of > > > MPI_Dims_create(int nnodes, int ndims, int dims[]) > > > Unfortunately there were some issues with the first attempt (r30539 and r30540) which were reverted. > > > > > > So I decided to give it a short review based on r30606 > > > > > > > > > 1.) freeprocs is initialized to be nnodes and the subsequent divisions of freeprocs have all positive integers as divisor. > > > So IMHO it would make more sense to check if nnodes > 0 in the MPI_PARAM_CHECK section at the begin instead of the following (see patch 0001): > > > > > > 99 if (freeprocs < 1) { > > > 100 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS, > > > 101 FUNC_NAME); > > > 102 } > > > > > > > > > 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int *nprimes, int **pprimes) > > > which makes mathematically more sens (as the largest prime factor of any number n cannot exceed \sqrt{n}) - and should produce the right result. ;) > > > (see patch 0002) > > > Here the improvements: > > > > > > module load mpi/openmpi/trunk-gnu.4.7.3 > > > $./mpi-dims-old 1000000 > > > time used for MPI_Dims_create(1000000, 3, {}): 8.104007 > > > module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing > > >$ ./mpi-dims-new 1000000 > > > time used for MPI_Dims_create(1000000, 3, {}): 0.060400 > > > > > > > > > 3.) Memory allocation for the list of prime numbers may be reduced up to a factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]: > > > \pi(x) < x/ln(x)(1+1.2762/ln(x)) for x > 1 > > > Unfortunately this saves us only 1.6 MB per process for 1mio nodes as reported by tcmalloc/pprof on a test program - but it may sum up with fatter nodes. :P > > > > > > $pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap > > > (pprof) top > > > Total: -1.6 MB > > > 0.3 -18.8% -18.8% 0.3 -18.8% getprimes2 > > > 0.0 -0.0% -18.8% -1.6 100.0% __libc_start_main > > > 0.0 -0.0% -18.8% -1.6 100.0% main > > > -1.9 118.8% 100.0% -1.9 118.8% getprimes > > > > > > Find attached patch for it in 0003. > > > > > > > > > If there are no issues I would like to commit this to trunk for further testing (+cmr for 1.7.5?) end of this week. > > > > > > Best regards > > > Christoph > > > > > > > > > > > > > > > -- > > > > > > Christoph Niethammer > > > High Performance Computing Center Stuttgart (HLRS) > > > Nobelstrasse 19 > > > 70569 Stuttgart > > > > > > Tel: ++49(0)711-685-87203 > > > email: niethammer_at_[hidden] > > > > > From e3292b90cac42fad80ed27a555419002ed61ab66 Mon Sep 17 00:00:00 2001 > > > From: Christoph Niethammer > > > Date: Mon, 10 Feb 2014 16:44:03 +0100 > > > Subject: [PATCH 1/3] Move parameter check into appropriate code section at the > > > begin. > > > > > > --- > > > ompi/mpi/c/dims_create.c | 11 ++++++----- > > > 1 file changed, 6 insertions(+), 5 deletions(-) > > > > > > diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c > > > index d2c3858..3d0792f 100644 > > > --- a/ompi/mpi/c/dims_create.c > > > +++ b/ompi/mpi/c/dims_create.c > > > @@ -71,6 +71,11 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[]) > > > return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD, > > > MPI_ERR_DIMS, FUNC_NAME); > > > } > > > + > > > + if (1 > nnodes) { > > > + return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD, > > > + MPI_ERR_DIMS, FUNC_NAME); > > > + } > > > } > > > > > > /* Get # of free-to-be-assigned processes and # of free dimensions */ > > > @@ -95,11 +100,7 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[]) > > > FUNC_NAME); > > > } > > > > > > - if (freeprocs < 1) { > > > - return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS, > > > - FUNC_NAME); > > > - } > > > - else if (freeprocs == 1) { > > > + if (freeprocs == 1) { > > > for (i = 0; i < ndims; ++i, ++dims) { > > > if (*dims == 0) { > > > *dims = 1; > > > -- > > > 1.8.3.2 > > > > > > > > From bc862c47ef8d581a8f6735c51983d6c9eeb95dfd Mon Sep 17 00:00:00 2001 > > > From: Christoph Niethammer > > > Date: Mon, 10 Feb 2014 18:50:51 +0100 > > > Subject: [PATCH 2/3] Speeding up detection of prime numbers using the fact > > > that the largest prime factor of any number n cannot exceed \sqrt{n}. > > > > > > --- > > > ompi/mpi/c/dims_create.c | 9 ++++++--- > > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > > > diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c > > > index 3d0792f..1c1c381 100644 > > > --- a/ompi/mpi/c/dims_create.c > > > +++ b/ompi/mpi/c/dims_create.c > > > @@ -5,7 +5,7 @@ > > > * Copyright (c) 2004-2005 The University of Tennessee and The University > > > * of Tennessee Research Foundation. All rights > > > * reserved. > > > - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, > > > + * Copyright (c) 2004-2014 High Performance Computing Center Stuttgart, > > > * University of Stuttgart. All rights reserved. > > > * Copyright (c) 2004-2005 The Regents of the University of California. > > > * All rights reserved. > > > @@ -20,6 +20,8 @@ > > > > > > #include "ompi_config.h" > > > > > > +#include > > > + > > > #include "ompi/mpi/c/bindings.h" > > > #include "ompi/runtime/params.h" > > > #include "ompi/communicator/communicator.h" > > > @@ -293,13 +295,14 @@ getprimes(int num, int *pnprime, int **pprimes) { > > > primes[i++] = 2; > > > > > > for (n = 3; n <= num; n += 2) { > > > - for (j = 1; j < i; ++j) { > > > + double nsqrt = sqrt((double) n); > > > + for(j = 0; (double) primes[j] <= nsqrt; j++) { > > > if ((n % primes[j]) == 0) { > > > break; > > > } > > > } > > > > > > - if (j == i) { > > > + if (primes[j] > nsqrt) { > > > if (i >= size) { > > > return MPI_ERR_DIMS; > > > } > > > -- > > > 1.8.3.2 > > > > > > > > From a012206cfbf7b8b22cef4cc5b811384e5e3ac32c Mon Sep 17 00:00:00 2001 > > > From: Christoph Niethammer > > > Date: Mon, 10 Feb 2014 19:02:03 +0100 > > > Subject: [PATCH 3/3] Reduce memory usage by a better approximation for the > > > upper limit of the number of primes. > > > > > > --- > > > ompi/mpi/c/dims_create.c | 12 ++++++++++-- > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c > > > index 1c1c381..8dd3144 100644 > > > --- a/ompi/mpi/c/dims_create.c > > > +++ b/ompi/mpi/c/dims_create.c > > > @@ -281,9 +281,17 @@ getprimes(int num, int *pnprime, int **pprimes) { > > > int n; > > > int size; > > > int *primes; > > > + double lognum; > > > > > > - /* Allocate the array of primes */ > > > - size = (num / 2) + 1; > > > + /* Allocate the array of primes > > > + * see Pierre Dusart, Math. Comp. 68 (1999), no. 225, 411???415.*/ > > > + lognum = log(num); > > > + if(num > 1) { > > > + size = ceil(num/lognum * (1+1.2762/lognum)); > > > + } > > > + else { > > > + size = 1; > > > + } > > > primes = (int *) malloc((unsigned) size * sizeof(int)); > > > if (NULL == primes) { > > > return MPI_ERR_NO_MEM; > > > -- > > > 1.8.3.2 > > > > > > > > _______________________________________________ > > > devel mailing list > > > devel_at_[hidden] > > > > > > -- > > ========================================================== > > Andreas Schäfer > > HPC and Grid Computing > > Chair of Computer Science 3 > > Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany > > +49 9131 85-27910 > > PGP/GPG key via keyserver > > ========================================================== > > > > (\___/) > > (+'.'+) > > (")_(") > > This is Bunny. Copy and paste Bunny into your > > signature to help him gain world domination! > > > > _______________________________________________ > > devel mailing list > > devel_at_[hidden] > > _______________________________________________ > > devel mailing list > > devel_at_[hidden] > > -- > ========================================================== > Andreas Schäfer > HPC and Grid Computing > Chair of Computer Science 3 > Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany > +49 9131 85-27910 > PGP/GPG key via keyserver > ========================================================== > > (\___/) > (+'.'+) > (")_(") > This is Bunny. Copy and paste Bunny into your > signature to help him gain world domination! > > _______________________________________________ > devel mailing list > devel_at_[hidden] > From 19af07aac40d9e2b9c827ab043643b1e6e28fe0c Mon Sep 17 00:00:00 2001 > From: Christoph Niethammer > Date: Tue, 11 Feb 2014 11:58:30 +0100 > Subject: [PATCH] Omit usage of pre calculated prime numbers and factorize > directly. > > --- > ompi/mpi/c/dims_create.c | 159 +++++++++++++++-------------------------------- > 1 file changed, 50 insertions(+), 109 deletions(-) > > diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c > index 3d0792f..02a3413 100644 > --- a/ompi/mpi/c/dims_create.c > +++ b/ompi/mpi/c/dims_create.c > @@ -5,7 +5,7 @@ > * Copyright (c) 2004-2005 The University of Tennessee and The University > * of Tennessee Research Foundation. All rights > * reserved. > - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, > + * Copyright (c) 2004-2014 High Performance Computing Center Stuttgart, > * University of Stuttgart. All rights reserved. > * Copyright (c) 2004-2005 The Regents of the University of California. > * All rights reserved. > @@ -20,6 +20,8 @@ > > #include "ompi_config.h" > > +#include > + > #include "ompi/mpi/c/bindings.h" > #include "ompi/runtime/params.h" > #include "ompi/communicator/communicator.h" > @@ -36,9 +38,8 @@ > static const char FUNC_NAME[] = "MPI_Dims_create"; > > /* static functions */ > -static int assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims); > -static int getfactors(int num, int nprime, int *primes, int **pcounts); > -static int getprimes(int num, int *pnprime, int **pprimes); > +static int assignnodes(int ndim, int nfactor, int *pfacts,int **pdims); > +static int getfactors(int num, int *nfators, int **factors); > > > /* > @@ -50,8 +51,7 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[]) > int i; > int freeprocs; > int freedims; > - int nprimes; > - int *primes; > + int nfactors; > int *factors; > int *procs; > int *p; > @@ -109,20 +109,14 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[]) > return MPI_SUCCESS; > } > > - /* Compute the relevant prime numbers for factoring */ > - if (MPI_SUCCESS != (err = getprimes(freeprocs, &nprimes, &primes))) { > - return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err, > - FUNC_NAME); > - } > - > /* Factor the number of free processes */ > - if (MPI_SUCCESS != (err = getfactors(freeprocs, nprimes, primes, &factors))) { > + if (MPI_SUCCESS != (err = getfactors(freeprocs, &nfactors, &factors))) { > return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err, > FUNC_NAME); > } > > /* Assign free processes to free dimensions */ > - if (MPI_SUCCESS != (err = assignnodes(freedims, nprimes, primes, factors, &procs))) { > + if (MPI_SUCCESS != (err = assignnodes(freedims, nfactors, factors, &procs))) { > return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err, > FUNC_NAME); > } > @@ -135,7 +129,6 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[]) > } > } > > - free((char *) primes); > free((char *) factors); > free((char *) procs); > > @@ -154,12 +147,11 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[]) > * Accepts: - # of dimensions > * - # of prime factors > * - array of prime factors > - * - array of factor counts > * - ptr to array of dimensions (returned value) > * Returns: - 0 or ERROR > */ > static int > -assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims) > +assignnodes(int ndim, int nfactor, int *pfacts, int **pdims) > { > int *bins; > int i, j; > @@ -185,17 +177,15 @@ assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims) > > /* Loop assigning factors from the highest to the lowest */ > for (j = nfactor - 1; j >= 0; --j) { > - f = pfacts[j]; > - for (n = counts[j]; n > 0; --n) { > - /* Assign a factor to the smallest bin */ > - pmin = bins; > - for (i = 1, p = pmin + 1; i < ndim; ++i, ++p) { > - if (*p < *pmin) { > - pmin = p; > - } > + f = pfacts[j]; > + /* Assign a factor to the smallest bin */ > + pmin = bins; > + for (i = 1, p = pmin + 1; i < ndim; ++i, ++p) { > + if (*p < *pmin) { > + pmin = p; > } > - *pmin *= f; > } > + *pmin *= f; > } > > /* Sort dimensions in decreasing order (O(n^2) for now) */ > @@ -217,96 +207,47 @@ assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims) > * > * Function: - factorize a number > * Accepts: - number > - * - # of primes > - * - array of primes > - * - ptr to array of counts (returned value) > - * Returns: - 0 or ERROR > + * - # prime factors > + * - array of prime factors > + * Returns: - MPI_SUCCESS or ERROR > */ > static int > -getfactors(int num, int nprime, int *primes, int **pcounts) > -{ > - int *counts; > +getfactors(int num, int *nfactors, int **factors) { > + int size; > + int n; > + int d; > int i; > - int *p; > - int *c; > - > - if (0 >= nprime) { > - return MPI_ERR_INTERN; > - } > + double lognum; > > - /* Allocate the factor counts array */ > - counts = (int *) malloc((unsigned) nprime * sizeof(int)); > - if (NULL == counts) { > - return MPI_ERR_NO_MEM; > + if(num < 2) { > + (*nfactors) = 0; > + (*factors) = NULL; > + return MPI_SUCCESS; > } > - > - *pcounts = counts; > - > - /* Loop over the prime numbers */ > - i = nprime - 1; > - p = primes + i; > - c = counts + i; > - > - for (; i >= 0; --i, --p, --c) { > - *c = 0; > - while ((num % *p) == 0) { > - ++(*c); > - num /= *p; > + /* Allocate the array of prime factors which cannot exceed log_2(num) entries */ > + n = num; > + size = ceil(log(num) / log(2)); > + *factors = (int *) malloc((unsigned) size * sizeof(int)); > + > + i = 0; > + /* determine all occurences of factor 2 */ > + while((num % 2) == 0) { > + num /= 2; > + (*factors)[i++] = 2; > + } > + /* determine all occurences of uneven prime numbers up to sqrt(num) */ > + d = 3; > + int sqrtnum = ceil(sqrt(num)); > + for(d = 3; (num > 1) && (d < sqrtnum); d += 2) { > + while((num % d) == 0) { > + num /= d; > + (*factors)[i++] = d; > } > } > - > - if (1 != num) { > - return MPI_ERR_INTERN; > + /* as we looped only up to sqrt(num) one factor > sqrt(num) may be left over */ > + if(num != 1) { > + (*factors)[i++] = num; > } > - > + (*nfactors) = i; > return MPI_SUCCESS; > } > - > -/* > - * getprimes > - * > - * Function: - get primes smaller than number > - * - array of primes dynamically allocated > - * Accepts: - number > - * - ptr # of primes (returned value) > - * - ptr array of primes (returned values) > - * Returns: - 0 or ERROR > - */ > -static int > -getprimes(int num, int *pnprime, int **pprimes) { > - > - int i, j; > - int n; > - int size; > - int *primes; > - > - /* Allocate the array of primes */ > - size = (num / 2) + 1; > - primes = (int *) malloc((unsigned) size * sizeof(int)); > - if (NULL == primes) { > - return MPI_ERR_NO_MEM; > - } > - *pprimes = primes; > - > - /* Find the prime numbers */ > - i = 0; > - primes[i++] = 2; > - > - for (n = 3; n <= num; n += 2) { > - for (j = 1; j < i; ++j) { > - if ((n % primes[j]) == 0) { > - break; > - } > - } > - > - if (j == i) { > - if (i >= size) { > - return MPI_ERR_DIMS; > - } > - primes[i++] = n; > - } > - } > - > - *pnprime = i; > - return MPI_SUCCESS; > -} > -- > 1.8.3.2 > > _______________________________________________ > devel mailing list > devel_at_[hidden] -- ========================================================== Andreas Schäfer HPC and Grid Computing Chair of Computer Science 3 Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany +49 9131 85-27910 PGP/GPG key via keyserver http://www.libgeodecomp.org ========================================================== (\___/) (+'.'+) (")_(") This is Bunny. Copy and paste Bunny into your signature to help him gain world domination!  application/pgp-signature attachment: Digital signature