Jérôme Raulin on Wed, 10 Jan 2018 21:01:32 +0100 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
sqrtnint algorithm improvement for huge arguments (plus memory leak issue) |
Dear all, sqrtnint(n, k) seems to have "strange" behavior for huge arguments. Reading GPRC: /home/jerome/.gprc ...Done. GP/PARI CALCULATOR Version 2.10.0 (development 21652-dae10e0) amd64 running linux (x86-64/GMP-6.1.2 kernel) 64-bit version compiled: Jan 9 2018, gcc version 7.2.0 (Ubuntu 7.2.0-1ubuntu1~16.04) threading engine: pthread (readline v6.3 enabled, extended help enabled) Copyright (C) 2000-2017 The PARI Group PARI/GP is free software, covered by the GNU General Public License, and comes WITHOUT ANY WARRANTY WHATSOEVER. Type ? for help, \q to quit. Type ?15 for how to get moral (and possibly technical) support. parisizemax = 1000001536, primelimit = 500000, nbthreads = 8 (20:23) gp > a=10^1000000; \\ 1 million decimal digits ! time = 27 ms. (20:24) gp > sqrtint(a); time = 39 ms. (20:24) gp > sqrtnint(a,12); time = 208 ms. (20:24) gp > sqrtnint(a,123); time = 127 ms. (20:24) gp > sqrtnint(a,1234); time = 172 ms. (20:24) gp > sqrtnint(a,12345); *** sqrtnint: Warning: increasing stack size to 80000000. *** sqrtnint: Warning: increasing stack size to 160000000. *** sqrtnint: Warning: increasing stack size to 320000000. *** sqrtnint: Warning: increasing stack size to 640000000. time = 10,236 ms. (20:24) gp > sqrtnint(a,123456); *** sqrtnint: Warning: increasing stack size to 80000000. *** sqrtnint: Warning: increasing stack size to 160000000. *** sqrtnint: Warning: increasing stack size to 320000000. *** sqrtnint: Warning: increasing stack size to 640000000. *** sqrtnint: Warning: increasing stack size to 1000001536. *** at top-level: sqrtnint(a,123456) *** ^------------------ *** sqrtnint: the PARI stack overflows ! current stack size: 1000001536 (953.676 Mbytes) [hint] you can increase 'parisizemax' using default() *** Break loop: type 'break' to go back to GP prompt break> I first ran sqrt to have an idea of the expected runtime of the more complex sqrtnint algorithm. It seems that starting for some value of the 2nd argument k : * sqrtnint leaks memory somewhat proportionally to k. * the runtime also starts to increase (maybe some correlation with the memory leak). I had a look at sqrtnint algorithm and from my understanding : * it first does some kind or argument reduction (I was not able to understand why), * find an approximation to he kth root with largest power of 2 larger than the expected kth root, * find the result applying standard Newton Raphson iterations. The performance issue is linked to the fact that for large k, the approximation is too far from the result And Newton Raphson algorithm needs too many iterations, i.e. most iterations barely increase precision result until last ones which converge very fast to the result. For sqrtnint(a,12345); more than 7700 iterations are needed while only 50 are needed for sqrtnint(a, 123); (while of course the result is far smaller). In a recent "Project Euler" problem solution thread, "philiplu" user discussed about a similar issue and drastically improve the running time of his solution by first findind by dichotomy the highest 12 bits before falling back to Newton Raphson. Translated in pari it looks like : iroot(x:int, n:int) = { local(xbl:int, mask:int, y:int, ynew:int, test:int, nm1:int); xbl = logint(x, 2); if (n >= xbl, return(1)); mask = 1 << ((xbl - 1) \ n); y = mask; if (y^n == x, return(y)); for (i = 0, 10, mask >>= 1; if (mask == 0, return(y)); ynew = y + mask; test = ynew^n; if (test <= x, if (test == x, return(ynew)); y = ynew; ); ); y += mask; nm1 = n - 1; while (1 == 1, ynew = (nm1 * y + x \ (y^nm1)) \ n; if (ynew >= y, break); y = ynew; ); return(y); } (20:44) gp > iroot(a,12345) time = 386 ms. %11 = 1010311380446684911802268122009368691535646607849018888903290947605488591556 607742 (20:44) gp > iroot(a,123456) time = 535 ms. %12 = 125907569 Of course choosing 12 bits is arbitrary and by playing with other values there is an optimal value for each n, k. Hoping to find more information I looked at gmp source code but the rootrem (in gmp/mpz/generic/rootrem.c) source code is extremely complex and there is no comment with a "magic" formula. Still gmp documentation seems to agree with this discussion : The initial approximation a1 is generated bitwise by successively powering a trial root with or without new 1 bits, aiming to be just above the true root. The iteration converges quadratically when started from a good approximation. When n is large more initial bits are needed to get good convergence. The current implementation is not particularly well optimized. iroot above could be used as a starting point for a faster implementation (or maybe gmp rootrem could be directly called when using gmp kernel (despite the comment, it is very fast)). Best regards Jerome