|
David Cleaver on Sat, 31 Jan 2026 01:11:45 +0100
|
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
|
Question about parallel execution...
|
Hello,
I've created a function, find_n, that performs calculations over a range of inputs [a,b].
I've updated the code to work with parfor, where:
- it sets the PARI/GP nbthreads to num_threads
- it runs num_threads copies of the function find_n
- the function find_n uses forstep so that each instance only works on its set of inputs in [a,b]
The function works as expected and produces the expected output, whether single threaded or multi-threaded.
However, the runtime (both cpu time and real time) of the parallel code is not working as I would expect.
For example, I have a 56 physical core computer (dual 28-core processors), and:
When I run 1 instance of find_n, it takes about 8.9 real-time seconds to run.
When I run with 2 instances of find_n, it takes about 9.2 real-time seconds to run.
When I run with 5 instances of find_n, it takes about 6 real-time seconds to run.
When I run with 10 instances of find_n, it takes about 7.5 real-time seconds to run.
When I run with 20 instances of find_n, it takes about 7.5 real-time seconds to run.
When I run with 30 instances of find_n, it takes about 9.4 real-time seconds to run.
When I run with 40 instances of find_n, it takes about 9.8 real-time seconds to run.
When I run with 50 instances of find_n, it takes about 9.5 real-time seconds to run.
I thought splitting the task into N parts would reduce the runtime by about 1/N, but I'm not seeing that in the actual runtimes.
The processor load indicates that it is using 100% of N cores during the full run, but the overall runtime isn't improving.
I've read that there is overhead in the multi-thread interface, but I was hoping that running only nbthreads instances would help mitigate that.
I'm seeing the same behavior on two different Windows 64-bit computers, one with 24 cores and the other with 56 cores.
I've tried Pari64-2-15-4-pthread.exe, Pari64-2-17-1-pthread.exe, and gppthread64-2-17-3.exe, they all show the same behavior.
I'm wondering if I'm doing something wrong in the code?
Could there be a function I'm using that is not multi-thread friendly?
I've created a small example function that works similarly to a bigger function I'm working on.
I've also included the output and runtimes from this sample function down below.
Can someone help me review this code and see why I might not be seeing a speed up with a large number of cores?
Thanks for any help or insights you can provide!
-David C.
find_n(h, threads, a, b) = {
my(j,k,m,x,y,z);
forstep(k = a + h - 1, b, threads,
x = digits(k);
z = vector(#x);
for(m=2, k-1,
for (j = 1, #x, z[j] = x[j]*m);
y = "";
for (j = 1, #z, y = concat(y, z[j]));
y = eval(y); /* convert the string to a number */
y = (y\1000000)%10000;
if(y == 9999, print("Thread ",h," found ",k," and ",m," : ",y));
);
);
}
export(find_n)
find_n_in_range(num_threads, a, b) = {
/* set num threads here */
default(nbthreads, num_threads);
parfor(i = 1, num_threads, find_n(i, num_threads, a, b));
}
? find_n_in_range(1, 10, 1000)
Thread 1 found 334 and 333 : 9999
Thread 1 found 335 and 333 : 9999
Thread 1 found 336 and 333 : 9999
Thread 1 found 337 and 333 : 9999
Thread 1 found 338 and 333 : 9999
Thread 1 found 339 and 333 : 9999
time = 8,907 ms.
? find_n_in_range(2, 10, 1000)
Thread 1 found 334 and 333 : 9999
Thread 1 found 336 and 333 : 9999
Thread 1 found 338 and 333 : 9999
Thread 2 found 335 and 333 : 9999
Thread 2 found 337 and 333 : 9999
Thread 2 found 339 and 333 : 9999
cpu time = 9,422 ms, real time = 9,257 ms.
? find_n_in_range(5, 10, 1000)
Thread 1 found 335 and 333 : 9999
Thread 5 found 334 and 333 : 9999
Thread 3 found 337 and 333 : 9999
Thread 5 found 339 and 333 : 9999
Thread 2 found 336 and 333 : 9999
Thread 4 found 338 and 333 : 9999
cpu time = 94 ms, real time = 6,068 ms.
? find_n_in_range(10, 10, 1000)
Thread 8 found 337 and 333 : 9999
Thread 6 found 335 and 333 : 9999
Thread 9 found 338 and 333 : 9999
Thread 7 found 336 and 333 : 9999
Thread 5 found 334 and 333 : 9999
Thread 10 found 339 and 333 : 9999
cpu time = 63 ms, real time = 7,545 ms.
? find_n_in_range(20, 10, 1000)
Thread 5 found 334 and 333 : 9999
Thread 7 found 336 and 333 : 9999
Thread 10 found 339 and 333 : 9999
Thread 9 found 338 and 333 : 9999
Thread 6 found 335 and 333 : 9999
Thread 8 found 337 and 333 : 9999
cpu time = 15 ms, real time = 7,578 ms.
? find_n_in_range(30, 10, 1000)
Thread 27 found 336 and 333 : 9999
Thread 29 found 338 and 333 : 9999
Thread 25 found 334 and 333 : 9999
Thread 30 found 339 and 333 : 9999
Thread 28 found 337 and 333 : 9999
Thread 26 found 335 and 333 : 9999
cpu time = 564 ms, real time = 9,485 ms.
? find_n_in_range(40, 10, 1000)
Thread 6 found 335 and 333 : 9999
Thread 10 found 339 and 333 : 9999
Thread 8 found 337 and 333 : 9999
Thread 7 found 336 and 333 : 9999
Thread 9 found 338 and 333 : 9999
Thread 5 found 334 and 333 : 9999
cpu time = 359 ms, real time = 9,878 ms.
? find_n_in_range(50, 10, 1000)
Thread 26 found 335 and 333 : 9999
Thread 28 found 337 and 333 : 9999
Thread 30 found 339 and 333 : 9999
Thread 25 found 334 and 333 : 9999
Thread 29 found 338 and 333 : 9999
Thread 27 found 336 and 333 : 9999
cpu time = 267 ms, real time = 9,503 ms.