Ilya Zakharevich on Thu, 18 Jan 2024 08:41:07 +0100 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: Implementation of forprime() and unextprime() |
On Mon, Jan 15, 2024 at 07:21:29PM +0100, Bill Allombert wrote: > On Thu, Jan 11, 2024 at 07:09:51PM -0800, Ilya Zakharevich wrote: > > B) unextprime() (the ulong version of nextprime() from > > basemath/ifactor1.c) has a rather convoluted code to sieve mod > > 210. Why not: > > > > • just store tables with a bitmap of a few (say 8) next odd > > residues mod N which are mutually prime with N. (Indexed by > > “an odd” residue mod N.) > > • For a given odd number, extract the corresponding bitmaps from > > the tables for a few values of N (such as 210, 11*19, 13*17, 23 and 29). > > • “bit-AND” these bitmaps. > > • Use the count of trailing/leading 0 bits (should be cheap) to > > jump ahead. > > > > I would not be surprised if this can speedup the code by 20% > > (maybe even a bit more for numbers closer to 2^64). > I have made a branch bill-unextprime with a simpler algorithm, > but the speed is exactly the same, which means that all the time is > spent in uisprime. Sorry that I was not clear enough: The purpose of the suggestion above is • not to speed up sieving mod 210, but • to significantly (5–6 orders of magnitude?!) slow it down. Currently, unextprime() is called every 4.375 integers (on average). My (quite optimistic) back-of-the-envelop calculations show that • One can call unextprime() about 5.64 times more rarely. • Due to slow-down of the sieving, this would lead to ∼5 times speedup¹⁾ of this case of forprime(). ¹⁾ With sieving called very rarely: about once per 19000 calls to unextprime(). For this (all numbers approximate based on rough calculations with experimental data about timing on my processor): • One should partially-sieve a range of 480K ulong⸣s using primes up to about 950000. • The bitmap for sieving should be placed so that it does not shadow-mod-32K the first 2K of the table of differences of primes. (Moreover, this speedup of ∼5 times seems to be quite robust: I got it with 2 different models of slowdowns of a processor. It may have ”some mathematical origin”…) (I hope to find time to write down these back-of-the-envelop calculations…) Hope this helps, Ilya