| hermann on Sat, 10 Jan 2026 16:20:23 +0100 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
| Re: isprime() runtime jump between 10^18 and 10^20? |
On 2026-01-06 20:59, hermann@stamm-wilbrandt.de wrote:
Wow, because of that 1h runtime I expected runtime for computing a(11) as <10h. I got surprised so much! Started with "nohup numactl -C 0-191 gp -q < a11.gp" and this gist:... Runtime reduced from 5min to 5 seconds, and from 23h to 19 minutes using parsum()! hermann@x3950-X6:~$ nproc 384 hermann@x3950-X6:~$ numactl -C 0-191 gp -q ? a(n)=parsum(k=1, 10^n, isprime(k^2+(k+1)^2)); ? # timer = 1 (on) ? a(9) cpu time = 14min, 51,150 ms, real time = 4,819 ms. 68588950 ? a(10) cpu time = 59h, 57min, 42,370 ms, real time = 18min, 46,398 ms. 614983774 ? In addition I did the parsum() count for 10^10 < k <= 2*10^10, and now the 19 minutes with lots of ispseudoprime() for processing 10^10 ks became 1h (not too bad) for processing the same amount of ks with zero ispseudoprime(): ? parsum(k=10^10+1,2*10^10,isprime(k^2+(k+1)^2)); cpu time = 191h, 45min, 10,976 ms, real time = 1h, 3,472 ms. ?
https://gist.github.com/Hermann-SW/fe69d723ddc2b8b6ff7666b747aad1aaThe only statement I had to add was "default(threadsizemax,100*10^6)", because the default value of 8MB was not big enough:
hermann@x3950-X6:~$ sed "s/\xd/\n/g" nohup.out timer = 1 (on) *** isprime: Warning: increasing stack size to 16000000. *** isprime: Warning: increasing stack size to 32000000. *** isprime: Warning: increasing stack size to 64000000. 5573589175 cpu time = 1110h, 34min, 16,003 ms, real time = 5h, 47min, 8,619 ms. hermann@x3950-X6:~$192*100MB is omly 19.2GB — I could increase threadmaxsize to 5GB for bigger computations as the server has 1TB ram. Seams to be possible, but not sure whether GP will ever need that:
? default(threadsizemax,5*10^9) ? default(threadsizemax) 5000000000 ?
I really have to get my new parallel OpenMP C++ code with better workload distribution running correctly (the old finished a(11) computation in only 14:24h real time). Since that does only traverse a (big) directed acyclic graph [of height 40billion(!)], neither isprime() calls are needed, nor computations of sqrt(-1) in modulus because those appear automatically when traversing the great-grandparent DAG of all composite numbers of the form k^2+(k+1)^2. That should result in <6h real time as well for C++ code as well, hopefully much better so that a(12) could be computed (given a(11) PARI/GP real time, a(12) could likely be computed with <60h, which is not good for power bill (I just started again, and computation with 19200 %CPU does draw 2.7KW the whole time).In my C++ code I was responsible to make sure that OpenMP creates work on all cores. Using the "par*" GP functions is so much easier, thanks to the dev team!
Current draft submission to A012132 explaining the sqrt(-1) thing: From Hermann Stamm-Wilbrandt, Jan 08 2026: (Start) Any prime divisor p of n^2+(n+1)^2 satisfies p == 1 (mod 4). Above tuple characterizations were wrt. divisibility of n^2+(n+1)^2.The two values for any given prime p == 1 (mod 4) of the tuples above can be characterized alternatively as the two solutions of quadratic equation a^2+(a+1)^2 == 0 (mod p). E.g., for the tuple (8,20,29) from above, 8^2+9^2=145 == 0 (mod 29) and 20^2+21^2=481 == 0 (mod 29).
For primes p == 1 (mod 4) the value sqrt(-1) (mod p) can always be computed, and from that the solution to discussed quadratic equation. In PARI: a=(sqrt(Mod(-1,89))-1)/2 results in Mod(61, 89), and a^2+(a+1)^2 in Mod(0, 89). Now take any n == a (mod 89), e.g., n=61+3*89. As expected (n^2+(n+1)^2) % 89 returns 0. (End)
Regards, Hermann.