Every Intuition Behind Fourier Transform
NTT is just FFT, optimized for computers.
In cryptography, using e, i, and Ο can be tricky for computers because they deal with finite numbers, and using these constants might lead to precision problems and errors.
As a solution, we have a different notion for roots of unity in finite fields (modular arithmetic).
For example, in the finite field modulo 13:
You can observe that 5, 12, 8, 1
roughly correspond to i, -1, -i, 1
in modulo 13. Similarly:
We prefer working with prime numbers in cryptography because they offer favorable properties.
In the context of modular arithmetic, when we use the modulo operation (e.g., mod 13
), we are essentially operating within finite fields. Here, our finite field size is 13.
q
, where q
is a prime number.n
roots of unity for this finite field of size q
.If n
divides q-1
, then there are indeed n
roots of unity in this setting.
In the provided example, where n=4
and q=13
, we can find 4 roots of unity.
To obtain an n
-th root of unity, we select a random non-zero element in the field [1,q)
. We then apply the formula below:
This process yields one of the four roots of unity, as demonstrated below:
Taking powers of 5
and 8
allows you to obtain 1
and 12
. However, the reverse is not true. This distinction leads us to refer to 5
and 8
as primitive n
-th roots of unity.
We are particularly interested in primitive n
-th roots of unity because they enable us to derive all the other roots of unity. Thankfully, there is a method for finding primitive n
-th roots of unity.
If a randomly selected x
satisfies the above formula, it is indeed a primitive n
-th root of unity.
Instead of x
, we use roots of unity. Thatβs it. No more e
, no more i
, no more Ο
.
In summary, to transition from FFT to NTT, we simply replace:
with:
and pre-compute these roots (omegas) for our setting so that we can supply the roots to our function.
Letβs go back to FFT for a bit.
The operations we perform to compute FFT are sometimes also referred as butterfly
. It will make more sense with images.
image taken from
In the above image, represents the coefficients of a 3rd degree polynomial.
1
and -1
in the first level,1
, i
, -1
, and -i
in the second level. Take your time and inspect it, it will make a lot of sense.Here is a demonstration of a 7th degree polynomial that requires 8 evaluations (for 8 different points).
image taken from
This time we have 3 levels:
1
and -1
.1
, i
, -1
, -i
.w1
, w2
, w3
, w4
, w5
, w6
, w7
, and w8
(where you take the 8th power of , you get 1
).Note: replace the imaginary roots with roots in finite field as we described above, and you get NTT. These diagrams wonβt change at all for NTT except the roots we multiply with.
The butterfly comes from this:
I honestly think it looks more like an hourglass than a butterfly, but okβ¦
For FFT, our algorithm was recursive (calling itself recursively).
If we use the above images to explain, we start from the left side (but not do the computations), and reach to the right side. Then, we do the calculations from the most far right side, and use what we have computed from right side to find the values on the left side.
If we want to squeeze out the performance, it usually is a good idea to turn recursion into iteration.
Iβm going to quote from Paul Heckbert here:
Quote startsβ¦
In the diagram of the 8-point FFT above, note that the inputs arenβt in normal order: theyβre in the bizarre order: . Why this sequence?
Below is a table of and the index of the th input samle, :
The pattern is obvious if and are written in binary (last two rows of the table). Observe that each is the bit-reversal of . The sequence is also related to breadth-first traversal of a binary tree.
Quote endsβ¦
So, we will just reverse the bits for our coefficients indices. That should make the access more straight-forward.
For those interested, the final version of NTT will look something like this, with some additional optimizations:
import math
def reverse_bits(number, bit_length):
# Reverses the bits of `number` up to `bit_length`.
reversed = 0
for i in range(0, bit_length):
if (number >> i) & 1:
reversed |= 1 << (bit_length - 1 - i)
return reversed
def cooley_tukey_ntt(a, q, omegas):
n = len(a)
out = a
for i in range(n):
rev_i = reverse_bits(i, n.bit_length() - 1)
if rev_i > i:
# swap(out[i], out[rev_i])
out[i] ^= out[rev_i]
out[rev_i] ^= out[i]
out[i] ^= out[rev_i]
log2n = math.log2(n)
iterations = int(log2n)
# below is complicated, but it is actually just to
# achieve indices of coefficients and roots of unity
M = 2
for _ in range(iterations): # recursion into log2n level iterations
for i in range(0, n, M): # represents the indices of even numbers
g = 0 # necessary for accessing pre-computed roots of unity
for j in range(0, M >> 1): # shift needed for finding the next even number
k = i + j + (M >> 1) # index of odd,
# where `i` is the index of the even,
# `j` is the shift that will lead us to next even number
# and `M` is the gap between `even` and `odd` pair
U = out[i + j] # even
V = out[k] * omegas[g] # x * O(x)
out[i + j] = (U + V) % q # E(x) + xO(x)
out[k] = (U - V) % q # E(x) - xO(x)
g = g + n // M # index for finding the next root of unity
M <<= 1 # double the gap between pairs after each recursion level
return out
If you prefer the version without comments, the code is available here.
Additionally, you can explore my research paper, where we accelerated the state-of-the-art SEAL library (homomorphic encryption) by 100x through GPU-optimized NTT code:
Thank you for reading!