hermann on Sat, 10 Jan 2026 16:20:23 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: isprime() runtime jump between 10^18 and 10^20?


On 2026-01-06 20:59, hermann@stamm-wilbrandt.de wrote:
...
Runtime reduced from 5min to 5 seconds, and from 23h to 19 minutes
using parsum()!

hermann@x3950-X6:~$ nproc
384
hermann@x3950-X6:~$ numactl -C 0-191 gp -q
? a(n)=parsum(k=1, 10^n, isprime(k^2+(k+1)^2));
? #
   timer = 1 (on)
? a(9)
cpu time = 14min, 51,150 ms, real time = 4,819 ms.
68588950
? a(10)
cpu time = 59h, 57min, 42,370 ms, real time = 18min, 46,398 ms.
614983774
?


In addition I did the parsum() count for 10^10 < k <= 2*10^10, and now
the 19 minutes with lots of ispseudoprime() for processing 10^10 ks
became 1h (not too bad) for processing the same amount of ks with zero
ispseudoprime():

? parsum(k=10^10+1,2*10^10,isprime(k^2+(k+1)^2));
cpu time = 191h, 45min, 10,976 ms, real time = 1h, 3,472 ms.
?

Wow, because of that 1h runtime I expected runtime for computing a(11) as <10h. I got surprised so much! Started with "nohup numactl -C 0-191 gp -q < a11.gp" and this gist:
https://gist.github.com/Hermann-SW/fe69d723ddc2b8b6ff7666b747aad1aa

The only statement I had to add was "default(threadsizemax,100*10^6)", because the default value of 8MB was not big enough:

hermann@x3950-X6:~$ sed "s/\xd/\n/g" nohup.out
   timer = 1 (on)
  *** isprime: Warning: increasing stack size to 16000000.
  *** isprime: Warning: increasing stack size to 32000000.
  *** isprime: Warning: increasing stack size to 64000000.
5573589175
cpu time = 1110h, 34min, 16,003 ms, real time = 5h, 47min, 8,619 ms.
hermann@x3950-X6:~$

192*100MB is omly 19.2GB — I could increase threadmaxsize to 5GB for bigger computations as the server has 1TB ram. Seams to be possible, but not sure whether GP will ever need that:

? default(threadsizemax,5*10^9)
? default(threadsizemax)
5000000000
?


In my C++ code I was responsible to make sure that OpenMP creates work
on all cores. Using the  "par*" GP functions is so much easier, thanks
to the dev team!

I really have to get my new parallel OpenMP C++ code with better workload distribution running correctly (the old finished a(11) computation in only 14:24h real time). Since that does only traverse a (big) directed acyclic graph [of height 40billion(!)], neither isprime() calls are needed, nor computations of sqrt(-1) in modulus because those appear automatically when traversing the great-grandparent DAG of all composite numbers of the form k^2+(k+1)^2. That should result in <6h real time as well for C++ code as well, hopefully much better so that a(12) could be computed (given a(11) PARI/GP real time, a(12) could likely be computed with <60h, which is not good for power bill (I just started again, and computation with 19200 %CPU does draw 2.7KW the whole time).


Current draft submission to A012132 explaining the sqrt(-1) thing:

From Hermann Stamm-Wilbrandt, Jan 08 2026: (Start)

Any prime divisor p of n^2+(n+1)^2 satisfies p == 1 (mod 4).

Above tuple characterizations were wrt. divisibility of n^2+(n+1)^2.

The two values for any given prime p == 1 (mod 4) of the tuples above can be characterized alternatively as the two solutions of quadratic equation a^2+(a+1)^2 == 0 (mod p). E.g., for the tuple (8,20,29) from above, 8^2+9^2=145 == 0 (mod 29) and 20^2+21^2=481 == 0 (mod 29).

For primes p == 1 (mod 4) the value sqrt(-1) (mod p) can always be computed, and from that the solution to discussed quadratic equation. In PARI: a=(sqrt(Mod(-1,89))-1)/2 results in Mod(61, 89), and a^2+(a+1)^2 in Mod(0, 89). Now take any n == a (mod 89), e.g., n=61+3*89. As expected (n^2+(n+1)^2) % 89 returns 0. (End)


Regards,

Hermann.