A Fast Fourier Transform Compiler Silvio D Carnevali
Contents FFTW and genfft: an introduction genfft: How it works 1.) DAG Creation 2.) Simplifier 3.) Scheduler 4.) Unparsing Conclusion: similar applications
genfft special purpose compiler objective Camelot produces DFT subroutines Outputs C code parameterized according to: - Input length - Data type
FFTW Collection of “Codelets” Codelets: fragments of C code Generated by genfft plan: optimal composition of codelets depends on input size and HW automatically selected by FFTW (FJ98)
Performance of FFTW Powers of 2Any powers of 2, 3, 5, 7
genfft: creation of the codelet’s DAG Nodes: data types Encode arithmetic expressions Use real numbers for C compatibility Generic node = operator Children = operands DAG Algorithm depends on input size
DAG creation Algorithms
FT Equation X = input vector Y = FT of X n n th root of unity
genfft: DAG Simplifier Bottom-up traversal of DAG local improvements: Algebraic transformations (constant folding, +/* simplification) CSE: eliminate existing + create new ones DFT-specific improvements
Algebraic transformations Simplifies multiplication by 1, 0 or -1 Simplifies addition by 0 Distribution: kx + ky = k(x + y)
DFT-Specific improvements Numeric constants made positive (Local) Constants: generally k and -k Reduces number of loads DAG transposition (for Linear Function) Simplifies DAG, transpose + simplify, transpose + simplify Reduces number of multiplications only
DFT-Specific improvements X Y A B X Y A B X Y A B DAG D Simplify DAG E Transpose DAG E T Simplify DAG F T Transpose DAG F Simplify DAG E
genfft: DAG Scheduler Goal: minimize use of regs No instruction scheduling Partitions DAG in 2 recursively register mapping Optimal for n = 2 k Partitioning heuristics Optimality? Not for n != 2 k
genfft: Unparsing Schedule unparsed to C Pipeline usage managed by C compiler genfft + C compiler: performance problems egcs “optimizer”
Conclusion & future work FFTW: The best of the best of the best… Over 100 downloads every week! genfft: specialized for linear functions Crystallographic FT FIR & IIR filters Image processing (JPEG discrete cosine transform)