| Ilya Zakharevich on Thu, 18 Jan 2024 08:41:07 +0100 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
| Re: Implementation of forprime() and unextprime() |
On Mon, Jan 15, 2024 at 07:21:29PM +0100, Bill Allombert wrote:
> On Thu, Jan 11, 2024 at 07:09:51PM -0800, Ilya Zakharevich wrote:
> > B) unextprime() (the ulong version of nextprime() from
> > basemath/ifactor1.c) has a rather convoluted code to sieve mod
> > 210. Why not:
> >
> > • just store tables with a bitmap of a few (say 8) next odd
> > residues mod N which are mutually prime with N. (Indexed by
> > “an odd” residue mod N.)
> > • For a given odd number, extract the corresponding bitmaps from
> > the tables for a few values of N (such as 210, 11*19, 13*17, 23 and 29).
> > • “bit-AND” these bitmaps.
> > • Use the count of trailing/leading 0 bits (should be cheap) to
> > jump ahead.
> >
> > I would not be surprised if this can speedup the code by 20%
> > (maybe even a bit more for numbers closer to 2^64).
> I have made a branch bill-unextprime with a simpler algorithm,
> but the speed is exactly the same, which means that all the time is
> spent in uisprime.
Sorry that I was not clear enough:
The purpose of the suggestion above is
• not to speed up sieving mod 210, but
• to significantly (5–6 orders of magnitude?!) slow it down.
Currently, unextprime() is called every 4.375 integers (on average).
My (quite optimistic) back-of-the-envelop calculations show that
• One can call unextprime() about 5.64 times more rarely.
• Due to slow-down of the sieving, this would lead to ∼5 times
speedup¹⁾ of this case of forprime().
¹⁾ With sieving called very rarely: about once per 19000 calls
to unextprime().
For this (all numbers approximate based on rough calculations with
experimental data about timing on my processor):
• One should partially-sieve a range of 480K ulong⸣s using primes up
to about 950000.
• The bitmap for sieving should be placed so that it does not
shadow-mod-32K the first 2K of the table of differences of primes.
(Moreover, this speedup of ∼5 times seems to be quite robust: I
got it with 2 different models of slowdowns of a processor. It
may have ”some mathematical origin”…)
(I hope to find time to write down these back-of-the-envelop
calculations…)
Hope this helps,
Ilya