Hello,
If you mean the current version in the ompitests/ibm svn repository I can confirm that it passes the topolgy/dimscreate test without errors. :)
The difference in the patches is as follows: The patch from Andreas only generated a table of prime numbers of up to sqrt(freeprocs) while my patch still produces prime numbers up to freeprocs. And for factoring we really need all factors up to freeprocs. The standard sqrt optimization was just introduced in the wrong place. :)
You are right with #3: It's a better approximation for the upper bound and the proof is something to be read under the Christmas tree. ;)
I just have to rethink if the ceil() is necessary in the code as I am not sure about rounding issues in floating point calculations here... :P
Regarding your questions:
1.) I don't think we have to cache prime numbers as MPI_Dims create will not be used frequently for factorization. If anybody needs faster factorization he would use his own  even more optimized  code. If you find some free time beside Open MPI go out for some harder problems at http://projecteuler.net. But please don't get frustrated from the assembler solutions. ;)
2.) Interesting idea: Using the approximation from the cited paper we should only need around 400 MB to store all primes in the int32 range. Potential for applying compression techniques still present. ^^
Regards
Christoph

Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart
Tel: ++49(0)71168587203
email: niethammer_at_[hidden]
http://www.hlrs.de/people/niethammer
 UrsprÃ¼ngliche Mail 
Von: "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]>
An: "Open MPI Developers" <devel_at_[hidden]>
Gesendet: Montag, 10. Februar 2014 20:12:08
Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
Nice! Can you verify that it passes the ibm test? I didn't look closely, and to be honest, I'm not sure why the previous improvement broke the IBM test because it hypothetically did what you mentioned (stopped at sqrt(freenodes)).
I think patch 1 is a nobrainer. I'm not sure about #2 because I'm not sure how it's different than the previous one, nor did I have time to investigate why the previous one broke the IBM test. #3 seems like a good idea, too; I did't check the paper, but I assume it's some kind of proof about the upper limit on the number of primes in a given range.
Two questions:
1. Should we cache generated prime numbers? (if so, it'll have to be done in a threadsafe way)
2. Should we just generate prime numbers and hardcode them into a table that is compiled into the code? We would only need primes up to the sqrt of 2billion (i.e., signed int), right? I don't know how many that is  if it's small enough, perhaps this is the easiest solution.
On Feb 10, 2014, at 1:30 PM, Christoph Niethammer <niethammer_at_[hidden]> wrote:
> Hello,
>
> I noticed some effort in improving the scalability of
> MPI_Dims_create(int nnodes, int ndims, int dims[])
> Unfortunately there were some issues with the first attempt (r30539 and r30540) which were reverted.
>
> So I decided to give it a short review based on r30606
> https://svn.openmpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
>
>
> 1.) freeprocs is initialized to be nnodes and the subsequent divisions of freeprocs have all positive integers as divisor.
> So IMHO it would make more sense to check if nnodes > 0 in the MPI_PARAM_CHECK section at the begin instead of the following (see patch 0001):
>
> 99 if (freeprocs < 1) {
> 100 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
> 101 FUNC_NAME);
> 102 }
>
>
> 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int *nprimes, int **pprimes)
> which makes mathematically more sens (as the largest prime factor of any number n cannot exceed \sqrt{n})  and should produce the right result. ;)
> (see patch 0002)
> Here the improvements:
>
> module load mpi/openmpi/trunkgnu.4.7.3
> $ ./mpidimsold 1000000
> time used for MPI_Dims_create(1000000, 3, {}): 8.104007
> module swap mpi/openmpi/trunkgnu.4.7.3 mpi/openmpi/trunkgnu.4.7.3testing
> $ ./mpidimsnew 1000000
> time used for MPI_Dims_create(1000000, 3, {}): 0.060400
>
>
> 3.) Memory allocation for the list of prime numbers may be reduced up to a factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
> \pi(x) < x/ln(x)(1+1.2762/ln(x)) for x > 1
> Unfortunately this saves us only 1.6 MB per process for 1mio nodes as reported by tcmalloc/pprof on a test program  but it may sum up with fatter nodes. :P
>
> $ pprof base=$PWD/primesold.0001.heap a.out primesnew.0002.heap
> (pprof) top
> Total: 1.6 MB
> 0.3 18.8% 18.8% 0.3 18.8% getprimes2
> 0.0 0.0% 18.8% 1.6 100.0% __libc_start_main
> 0.0 0.0% 18.8% 1.6 100.0% main
> 1.9 118.8% 100.0% 1.9 118.8% getprimes
>
> Find attached patch for it in 0003.
>
>
> If there are no issues I would like to commit this to trunk for further testing (+cmr for 1.7.5?) end of this week.
>
> Best regards
> Christoph
>
> [1] http://www.ams.org/journals/mcom/199968225/S0025571899010376/home.html
>
>
>
> 
>
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
>
> Tel: ++49(0)71168587203
> email: niethammer_at_[hidden]
> http://www.hlrs.de/people/niethammer<0001Moveparametercheckintoappropriatecodesectiona.patch><0002Speedingupdetectionofprimenumbersusingthefac.patch><0003Reducememoryusagebyabetterapproximationforth.patch>_______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.openmpi.org/mailman/listinfo.cgi/devel

Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.openmpi.org/mailman/listinfo.cgi/devel
