This Python 3.12.7 script with NumPy 2.2.4 and Numba 0.61.2:
import numpy as np, timeit as ti, numba as nb
def f0(a):
p0 = a[:-2]
p1 = a[1:-1]
p2 = a[2:]
return (p0 < p1) & (p1 > p2)
def f1(a):
p0 = a[:-4]
p1 = a[1:-3]
p2 = a[2:-2]
p3 = a[3:-1]
p4 = a[4:]
return ((p0 < p1) & (p1 == p2) | (p1 < p2)) & ((p2 > p3) | (p2 == p3) & (p3 > p4))
@nb.njit(fastmath=True)
def g0(a):
r = np.zeros_like(a, dtype=np.bool)
for i in range(1, a.size-1):
r[i] = (a[i-1] < a[i]) & (a[i+1] < a[i])
return r[1:-1]
@nb.njit(fastmath=True)
def g1(a):
r = np.zeros_like(a, dtype=np.bool)
for i in range(2, a.size-2):
r[i] = ((a[i-1] == a[i]) & (a[i-2] < a[i-1]) | (a[i-1] < a[i])) & \
((a[i+1] == a[i]) & (a[i+2] < a[i+1]) | (a[i+1] < a[i]))
return r[2:-2]
a = np.random.randint(0, 256, (500, 500)).astype(np.uint8)
b = a.ravel()
print(f'Minimum, median and maximum execution time in us:')
for fun in ('f0(b)', 'f1(b)', 'g0(b)', 'g1(b)'):
t = 10**6 * np.array(ti.repeat(stmt=fun, setup=fun, globals=globals(), number=1, repeat=999))
print(f'{fun:20} {np.amin(t):8,.3f} {np.median(t):8,.3f} {np.amax(t):8,.3f}')
produces these timings on AMD Ryzen Ryzen 7 3800X PC running Ubuntu 22.04.5:
Minimum, median and maximum execution time in us:
f0(b) 32.261 33.483 95.640
f1(b) 118.974 120.737 129.424
g0(b) 11.081 11.281 19.327
g1(b) 723.319 744.419 794.042
For a simple array expression, i.e. f0 vs g0, Numba is about 3x faster than NumPy, but for more complex expression, i.e. f1 vs g1, Numba becomes 5x slower. This is very surprising to me. What is the reason? Can these run times be improved?
range(len(a)), and restrict array access toa[i]. In short, I believe you’ll get more stable behavior if you simply replace the last line of yourf1function with an explicit loop.f1is already 5x faster thang1. I needg1improvement.f1asg1b. Replace the last line ofg1b(taken fromf1) with an explicit loop. Add thenjitdecorator tog1b. Comparing the performance off1,g1, andg1b,g1bwill be the fastest.