Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Reviewing MPI_Dims_create
From: Christoph Niethammer (niethammer_at_[hidden])
Date: 2014-02-10 13:30:17


Hello,

I noticed some effort in improving the scalability of
MPI_Dims_create(int nnodes, int ndims, int dims[])
Unfortunately there were some issues with the first attempt (r30539 and r30540) which were reverted.

So I decided to give it a short review based on r30606
https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606

1.) freeprocs is initialized to be nnodes and the subsequent divisions of freeprocs have all positive integers as divisor.
So IMHO it would make more sense to check if nnodes > 0 in the MPI_PARAM_CHECK section at the begin instead of the following (see patch 0001):

99 if (freeprocs < 1) {
100 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
101 FUNC_NAME);
102 }

2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int *nprimes, int **pprimes)
which makes mathematically more sens (as the largest prime factor of any number n cannot exceed \sqrt{n}) - and should produce the right result. ;)
(see patch 0002)
Here the improvements:

module load mpi/openmpi/trunk-gnu.4.7.3
$ ./mpi-dims-old 1000000
time used for MPI_Dims_create(1000000, 3, {}): 8.104007
module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing
$ ./mpi-dims-new 1000000
time used for MPI_Dims_create(1000000, 3, {}): 0.060400

3.) Memory allocation for the list of prime numbers may be reduced up to a factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
\pi(x) < x/ln(x)(1+1.2762/ln(x)) for x > 1
Unfortunately this saves us only 1.6 MB per process for 1mio nodes as reported by tcmalloc/pprof on a test program - but it may sum up with fatter nodes. :P

$ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap
(pprof) top
Total: -1.6 MB
     0.3 -18.8% -18.8% 0.3 -18.8% getprimes2
     0.0 -0.0% -18.8% -1.6 100.0% __libc_start_main
     0.0 -0.0% -18.8% -1.6 100.0% main
    -1.9 118.8% 100.0% -1.9 118.8% getprimes

Find attached patch for it in 0003.

If there are no issues I would like to commit this to trunk for further testing (+cmr for 1.7.5?) end of this week.

Best regards
Christoph

[1] http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html

--
Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart
Tel: ++49(0)711-685-87203
email: niethammer_at_[hidden]
http://www.hlrs.de/people/niethammer