Communication costs of Schönhage-Strassen fast integer multiplication Derrick Coetzee This work released under Creative Commons Zero Waiver by its author, Derrick Coetzee (dcoetzee@eecs.berkeley.edu).
Schönhage-Strassen Algorithm Classical FFT-based integer multiplication algorithm O(n log n log log n) on n-bit inputs Asymptotically fastest algorithm used in practice
Acyclic convolution Vector operation “Multiplication without carrying” Can compute efficiently with convolution theorem 1 2 3 4 5 6 7 8 8 16 24 32 7 14 21 28 6 12 18 24 5 10 15 20_________ 5 16 34 60 61 52 32
Using convolution to multiply 1234 5678 Split and zero pad Split and zero pad 4 3 2 1 8 7 6 5 DFT DFT Recursive pointwise multiplication IDFT 32 52 61 60 34 16 5______ 7006652 32 52 61 60 34 16 5 Recombination (carrying) 7006652
Vector length and entry size 01111001 Take n-bit input, split into 𝑛 parts of 𝑛 bits each Vector length = 𝑛 Vector entries must be large enough to hold sum of 𝑛 values, each a product of two 𝑛 -bit numbers ⌈lg 𝑛 ( 2 𝑛 −1)2⌉ = 2 𝑛 + ½lg n bits 01 11 10 01 m=2 d = 4 1 2 3 4 5 6 7 8 8 16 24 32 7 14 21 28 6 12 18 24 5 10 15 20_________ 5 16 34 60 61 52 32
Arithmetic cost All ops except FFT and recursive multiplications are O(n) Avoid recursive multiplications in FFTs by performing all operations in ℤ2n′+1 FFTs take O(n log n) time T(n) = 𝑛 T(2 𝑛 + ½lg n) + O(n log n) Solution: T(n) = O(n log n log log n)
Communication cost Suppose each FFT done independently Hong-Kung FFT lower bound (met by algorithm of Frigo): O(n log n/log M) Stop recursion when subproblem fits in cache (Q(n) = n for n < αM) Q(n) = 𝑛 Q(2 𝑛 ) + O(n log n/log M) Q(n) = O(n (log n/log M) log(log n/log M))