Hi,
I've recently used parsum(m2=1,20,<code>) on a 16-threaded
machine. parsum parallelizes the sum only up to floor(sqrt(20))
threads (hence 4 in my case, for a 16-threaded computer). While
this is certainly a good limit if the number of things to sum is
>= nbthreads^2, it is less good in the other case.
Couldn't parsum limit the number of threads to something
larger if b-a is small? I suspect accessing the number of threads
from inside parsum() is off-limits. However one could use something
similar to
min(N,sqrtint(N+90)),
where N=b-a+1 (you get N threads up to N=10, then roughly
sqrt(N)).
Thoughts?
Loïc
PS: it's easy to outsmart parsum, summing up to 256 and substituting
<code> with if (m2<=20,<code>), however I think it'd be better if
parsum were smarter.