Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Reviewing MPI_Dims_create
From: Christoph Niethammer (niethammer_at_[hidden])
Date: 2014-02-10 19:22:48


If you mean the current version in the ompi-tests/ibm svn repository I can confirm that it passes the topolgy/dimscreate test without errors. :)

The difference in the patches is as follows: The patch from Andreas only generated a table of prime numbers of up to sqrt(freeprocs) while my patch still produces prime numbers up to freeprocs. And for factoring we really need all factors up to freeprocs. The standard sqrt optimization was just introduced in the wrong place. :)

You are right with #3: It's a better approximation for the upper bound and the proof is something to be read under the Christmas tree. ;)
I just have to rethink if the ceil() is necessary in the code as I am not sure about rounding issues in floating point calculations here... :P

Regarding your questions:
1.) I don't think we have to cache prime numbers as MPI_Dims create will not be used frequently for factorization. If anybody needs faster factorization he would use his own - even more optimized - code. If you find some free time beside Open MPI go out for some harder problems at But please don't get frustrated from the assembler solutions. ;)

2.) Interesting idea: Using the approximation from the cited paper we should only need around 400 MB to store all primes in the int32 range. Potential for applying compression techniques still present. ^^


Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart
Tel: ++49(0)711-685-87203
email: niethammer_at_[hidden]
----- Ursprüngliche Mail -----
Von: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]>
An: "Open MPI Developers" <devel_at_[hidden]>
Gesendet: Montag, 10. Februar 2014 20:12:08
Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
Nice!  Can you verify that it passes the ibm test?  I didn't look closely, and to be honest, I'm not sure why the previous improvement broke the IBM test because it hypothetically did what you mentioned (stopped at sqrt(freenodes)).
I think patch 1 is a no-brainer.  I'm not sure about #2 because I'm not sure how it's different than the previous one, nor did I have time to investigate why the previous one broke the IBM test.  #3 seems like a good idea, too; I did't check the paper, but I assume it's some kind of proof about the upper limit on the number of primes in a given range.
Two questions:
1. Should we cache generated prime numbers?  (if so, it'll have to be done in a thread-safe way)
2. Should we just generate prime numbers and hard-code them into a table that is compiled into the code?  We would only need primes up to the sqrt of 2billion (i.e., signed int), right?  I don't know how many that is -- if it's small enough, perhaps this is the easiest solution.
On Feb 10, 2014, at 1:30 PM, Christoph Niethammer <niethammer_at_[hidden]> wrote:
> Hello,
> I noticed some effort in improving the scalability of
> MPI_Dims_create(int nnodes, int ndims, int dims[])
> Unfortunately there were some issues with the first attempt (r30539 and r30540) which were reverted.
> So I decided to give it a short review based on r30606
> 1.) freeprocs is initialized to be nnodes and the subsequent divisions of freeprocs have all positive integers as divisor.
> So IMHO it would make more sense to check if nnodes > 0 in the MPI_PARAM_CHECK section at the begin instead of the following (see patch 0001):
> 99	    if (freeprocs < 1) {
> 101	                                     FUNC_NAME);
> 102	    }
> 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int *nprimes, int **pprimes)
> which makes mathematically more sens (as the largest prime factor of any number n cannot exceed \sqrt{n}) - and should produce the right result. ;)
> (see patch 0002)
> Here the improvements:
> module load mpi/openmpi/trunk-gnu.4.7.3
> $ ./mpi-dims-old 1000000
> time used for MPI_Dims_create(1000000, 3, {}): 8.104007
> module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing
> $ ./mpi-dims-new 1000000
> time used for MPI_Dims_create(1000000, 3, {}): 0.060400
> 3.) Memory allocation for the list of prime numbers may be reduced up to a factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
> \pi(x)  < x/ln(x)(1+1.2762/ln(x))  for x > 1
> Unfortunately this saves us only 1.6 MB per process for 1mio nodes as reported by tcmalloc/pprof on a test program - but it may sum up with fatter nodes. :P
> $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap
> (pprof) top
> Total: -1.6 MB
>     0.3 -18.8% -18.8%      0.3 -18.8% getprimes2
>     0.0  -0.0% -18.8%     -1.6 100.0% __libc_start_main
>     0.0  -0.0% -18.8%     -1.6 100.0% main
>    -1.9 118.8% 100.0%     -1.9 118.8% getprimes
> Find attached patch for it in 0003.
> If there are no issues I would like to commit this to trunk for further testing (+cmr for 1.7.5?) end of this week.
> Best regards
> Christoph
> [1]
> --
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
> Tel: ++49(0)711-685-87203
> email: niethammer_at_[hidden]
> devel mailing list
> devel_at_[hidden]
Jeff Squyres
For corporate legal information go to:
devel mailing list