Igor Schein on Wed, 05 May 2004 22:53:42 +0200 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: round4 performance |
On Wed, May 05, 2004 at 08:35:40PM +0200, Xavier-François Roblot wrote: > On Wed, 2004-05-05 at 17:36, Bill Allombert wrote: > > On Tue, May 04, 2004 at 04:17:06PM +0200, Xavier-François Roblot wrote: > > > Hi developers, > > > > > > On Wed, 2004-04-28 at 04:36, Igor Schein wrote: > > > > Hi, > > > > > > > > here's a polynomial on which round2 heavily outperforms round4: > > > > > > > > x^64 + 528*x^60 + 422640*x^56 + 154189440*x^52 + 46085461920*x^48 + 86643136 > > > > 30464*x^44 + 1067417121457152*x^40 + 76273480007101440*x^36 + 29307987576360 > > > > 23040*x^32 + 31785192406564024320*x^28 + 729662629421496287232*x^24 - 866453 > > > > 9928316858220544*x^20 + 114361270118934673858560*x^16 - 51757447983683584779 > > > > 87840*x^12 + 47264303406943521096007680*x^8 - 774599638688652156020195328*x^ > > > > 4 + 7585622913396815983510880256 > > > > > > I have committed a big patch that attempts to make round4 perform better > > > with this kind of polynomial. There are some improvements but still it > > > is significantly slower than round2... Also, I added some garbage > > > collecting so at least the stack necessary should now be reasonable. In > > > any case, the changes I have made still require a lot of tunings (and > > > also probably debugging!), so any feedback is welcome. > > > > Apparently, now round4 is faster than round2 by 10% on this example. > > (maybe not if we look only at the prime 3, though). > > > > I wonder if fastvalpos() could not be changed to compute > > newton sums iteratively and stop as soon as the criterium apply > > and return 0, instead of precomputing a fixed set of sums. > > > > Cheers, > > Bill. > > Well, I have modified update_alpha (after Karim pointed out a strange > behavior in this function) and that kind of miraculously speed up > dramatically that example!... As you will see, the computing time is now > very reasonable and it runs with a small stack too (I hope the result is > still correct though, I haven't checked yet). Igor, Karim and I still > have some ideas for improvements for nilord but you need some new bad > polynomials to test them. Please send me your worst examples! Maybe later, now it's regression time: ? allocatemem() *** Warning: doubling stack size; new stack = 256000000 (244.141 Mbytes). ? nfinit(x^8-4*x^7-126*x^6+392*x^5+4853*x^4-10364*x^3-58244*x^2+63492*x+197761); *** object too big, length can't fit in a codeword Thanks Igor