Download presentation
Presentation is loading. Please wait.
1
A Fast Fourier Transform Compiler Silvio D Carnevali
2
Contents FFTW and genfft: an introduction genfft: How it works 1.) DAG Creation 2.) Simplifier 3.) Scheduler 4.) Unparsing Conclusion: similar applications
3
genfft special purpose compiler objective Camelot produces DFT subroutines Outputs C code parameterized according to: - Input length - Data type
4
FFTW Collection of “Codelets” Codelets: fragments of C code Generated by genfft plan: optimal composition of codelets depends on input size and HW automatically selected by FFTW (FJ98)
5
Performance of FFTW Powers of 2Any powers of 2, 3, 5, 7
6
genfft: creation of the codelet’s DAG Nodes: data types Encode arithmetic expressions Use real numbers for C compatibility Generic node = operator Children = operands DAG Algorithm depends on input size
7
DAG creation Algorithms
8
FT Equation X = input vector Y = FT of X n n th root of unity
9
genfft: DAG Simplifier Bottom-up traversal of DAG local improvements: Algebraic transformations (constant folding, +/* simplification) CSE: eliminate existing + create new ones DFT-specific improvements
10
Algebraic transformations Simplifies multiplication by 1, 0 or -1 Simplifies addition by 0 Distribution: kx + ky = k(x + y)
11
DFT-Specific improvements Numeric constants made positive (Local) Constants: generally k and -k Reduces number of loads DAG transposition (for Linear Function) Simplifies DAG, transpose + simplify, transpose + simplify Reduces number of multiplications only
12
DFT-Specific improvements X Y A B 5 4 3 2 X Y A B 5 4 3 2 X Y A B 5 4 3 2 DAG D Simplify DAG E Transpose DAG E T Simplify DAG F T Transpose DAG F Simplify DAG E
13
genfft: DAG Scheduler Goal: minimize use of regs No instruction scheduling Partitions DAG in 2 recursively register mapping Optimal for n = 2 k Partitioning heuristics Optimality? Not for n != 2 k
14
genfft: Unparsing Schedule unparsed to C Pipeline usage managed by C compiler genfft + C compiler: performance problems egcs “optimizer”
15
Conclusion & future work FFTW: The best of the best of the best… Over 100 downloads every week! genfft: specialized for linear functions Crystallographic FT FIR & IIR filters Image processing (JPEG discrete cosine transform)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.