Open MPI Development Mailing List Archives

 |   Home   |   Support   |   FAQ   |   all Development mailing list
 Subject: [OMPI devel] Reviewing MPI_Dims_create From: Christoph Niethammer (niethammer_at_[hidden]) Date: 2014-02-10 13:30:17 Hello, I noticed some effort in improving the scalability of MPI_Dims_create(int nnodes, int ndims, int dims[]) Unfortunately there were some issues with the first attempt (r30539 and r30540) which were reverted. So I decided to give it a short review based on r30606 https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606 1.) freeprocs is initialized to be nnodes and the subsequent divisions of freeprocs have all positive integers as divisor. So IMHO it would make more sense to check if nnodes > 0 in the MPI_PARAM_CHECK section at the begin instead of the following (see patch 0001): 99 if (freeprocs < 1) { 100 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS, 101 FUNC_NAME); 102 } 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int *nprimes, int **pprimes) which makes mathematically more sens (as the largest prime factor of any number n cannot exceed \sqrt{n}) - and should produce the right result. ;) (see patch 0002) Here the improvements: module load mpi/openmpi/trunk-gnu.4.7.3 $./mpi-dims-old 1000000 time used for MPI_Dims_create(1000000, 3, {}): 8.104007 module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing$ ./mpi-dims-new 1000000 time used for MPI_Dims_create(1000000, 3, {}): 0.060400 3.) Memory allocation for the list of prime numbers may be reduced up to a factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]: \pi(x) < x/ln(x)(1+1.2762/ln(x)) for x > 1 Unfortunately this saves us only 1.6 MB per process for 1mio nodes as reported by tcmalloc/pprof on a test program - but it may sum up with fatter nodes. :P $pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap (pprof) top Total: -1.6 MB      0.3 -18.8% -18.8% 0.3 -18.8% getprimes2      0.0 -0.0% -18.8% -1.6 100.0% __libc_start_main      0.0 -0.0% -18.8% -1.6 100.0% main     -1.9 118.8% 100.0% -1.9 118.8% getprimes Find attached patch for it in 0003. If there are no issues I would like to commit this to trunk for further testing (+cmr for 1.7.5?) end of this week. Best regards Christoph -- Christoph Niethammer High Performance Computing Center Stuttgart (HLRS) Nobelstrasse 19 70569 Stuttgart Tel: ++49(0)711-685-87203 email: niethammer_at_[hidden] http://www.hlrs.de/people/niethammer  text/x-patch attachment: 0001-Move-parameter-check-into-appropriate-code-section-a.patch text/x-patch attachment: 0002-Speeding-up-detection-of-prime-numbers-using-the-fac.patch text/x-patch attachment: 0003-Reduce-memory-usage-by-a-better-approximation-for-th.patch