A Fast Fourier Transform Compiler Silvio D Carnevali.

Slides:



Advertisements
Similar presentations
Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) Jeff.
Advertisements

7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.
Carnegie Mellon Automatic Generation of Vectorized Fast Fourier Transform Libraries for the Larrabee and AVX Instruction Set Extension Automatic Generation.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Compiler-Based Register Name Adjustment for Low-Power Embedded Processors Discussion by Garo Bournoutian.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
The Study of Cache Oblivious Algorithms Prepared by Jia Guo.
Parallel Fast Fourier Transform Ryan Liu. Introduction The Discrete Fourier Transform could be applied in science and engineering. Examples: ◦ Voice recognition.
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Introduction to Fast Fourier Transform (FFT) Algorithms R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
CS 104 Introduction to Computer Science and Graphics Problems
Code Generation Simple Register Allocation Mooly Sagiv html:// Chapter
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Arithmetic Expression Consider the expression arithmetic expression: (a – b) + ((c + d) + (e * f)) that can be represented as the following tree.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)
The Structure of the GNAT Compiler. A target-independent Ada95 front-end for GCC Ada components C components SyntaxSemExpandgigiGCC AST Annotated AST.
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
6-5: Operations with Radical Expressions I can add or subtract expressions involving radicals.
Chapter 12 Fast Fourier Transform. 1.Metropolis algorithm for Monte Carlo 2.Simplex method for linear programming 3.Krylov subspace iteration (CG) 4.Decomposition.
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
The Distributive Property Purpose: To use the distributive property Outcome: To simplify algebraic expressions.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson 2 Robert Johnson 3, David Padua 1 1 Computer Science, University of Illinois.
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
Floating-point to fixed-point code conversion with variable trade-off between computational complexity and accuracy loss Alexandru Bârleanu, Vadim Băitoiu.
High Performance Linear Transform Program Generation for the Cell BE
CS654: Digital Image Analysis Lecture 15: Image Transforms with Real Basis Functions.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Implementation of Fast Fourier Transform on General Purpose Computers Tianxiang Yang.
Automatic Performance Tuning Jeremy Johnson Dept. of Computer Science Drexel University.
Carnegie Mellon Generating High-Performance General Size Linear Transform Libraries Using Spiral Yevgen Voronenko Franz Franchetti Frédéric de Mesmay Markus.
5.6 Convolution and FFT. 2 Fast Fourier Transform: Applications Applications. n Optics, acoustics, quantum physics, telecommunications, control systems,
High Performance Scalable Base-4 Fast Fourier Transform Mapping Greg Nash Centar 2003 High Performance Embedded Computing Workshop
Advanced Compiler Design Early Optimizations. Introduction Constant expression evaluation (constant folding)  dataflow independent Scalar replacement.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Concepts of Multimedia Processing and Transmission IT 481, Lecture 2 Dennis McCaughey, Ph.D. 29 January, 2007.
1 CSE 20 Lecture 13: Analysis of Recursive Functions CK Cheng.
Operand Addressing And Instruction Representation Cs355-Chapter 6.
DIGITAL SIGNAL PROCESSORS. Von Neumann Architecture Computers to be programmed by codes residing in memory. Single Memory to store data and program.
ICS 252 Introduction to Computer Design Lecture 12 Winter 2004 Eli Bozorgzadeh Computer Science Department-UCI.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Compilers as Collaborators and Competitors of High-Level Specification Systems David Padua University of Illinois at Urbana-Champaign.
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
Linear Analysis and Optimization of Stream Programs Masterworks Presentation Andrew A. Lamb 4/30/2003 Professor Saman Amarasinghe MIT Laboratory for Computer.
L9 : Low Power DSP Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab.
Discrete Fourier Transform
Chapter 4 Structures for Discrete-Time System Introduction The block diagram representation of the difference equation Basic structures for IIR system.
More Code Generation and Optimization Pat Morin COMP 3002.
Solving linear equations  Review the properties of equality  Equations that involve simplification  Equations containing fractions  A general strategy.
The content of lecture This lecture will cover: Fourier Transform
High-level optimization Jakub Yaghob
Structures for Discrete-Time Systems
EEE4176 Applications of Digital Signal Processing
Chapter 12 Fast Fourier Transform
FFTs, Portability, & Performance
A systolic array for a 2D-FIR filter for image processing
لجنة الهندسة الكهربائية
Lect5 A framework for digital filter design
Instruction Scheduling Hal Perkins Winter 2008
Multiplier-less Multiplication by Constants
Chapter 9 Computation of the Discrete Fourier Transform
Sungho Kang Yonsei University
1-D DISCRETE COSINE TRANSFORM DCT
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
 = N  N matrix multiplication N = 3 matrix N = 3 matrix N = 3 matrix
Zhongguo Liu Biomedical Engineering
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

A Fast Fourier Transform Compiler Silvio D Carnevali

Contents FFTW and genfft: an introduction genfft: How it works 1.) DAG Creation 2.) Simplifier 3.) Scheduler 4.) Unparsing Conclusion: similar applications

genfft special purpose compiler objective Camelot produces DFT subroutines Outputs C code parameterized according to: - Input length - Data type

FFTW Collection of “Codelets” Codelets: fragments of C code Generated by genfft plan: optimal composition of codelets  depends on input size and HW  automatically selected by FFTW (FJ98)

Performance of FFTW Powers of 2Any powers of 2, 3, 5, 7

genfft: creation of the codelet’s DAG Nodes: data types  Encode arithmetic expressions  Use real numbers for C compatibility Generic node = operator Children = operands DAG Algorithm depends on input size

DAG creation Algorithms

FT Equation X = input vector Y = FT of X  n  n th root of unity

genfft: DAG Simplifier Bottom-up traversal of DAG local improvements:  Algebraic transformations (constant folding, +/* simplification)  CSE: eliminate existing + create new ones  DFT-specific improvements

Algebraic transformations Simplifies multiplication by 1, 0 or -1 Simplifies addition by 0 Distribution: kx + ky = k(x + y)

DFT-Specific improvements Numeric constants made positive (Local)  Constants: generally k and -k  Reduces number of loads DAG transposition (for Linear Function)  Simplifies DAG, transpose + simplify, transpose + simplify  Reduces number of multiplications only

DFT-Specific improvements X Y A B X Y A B X Y A B DAG D Simplify DAG E Transpose DAG E T Simplify DAG F T Transpose DAG F Simplify DAG E

genfft: DAG Scheduler Goal: minimize use of regs No instruction scheduling Partitions DAG in 2 recursively  register mapping  Optimal for n = 2 k  Partitioning heuristics Optimality? Not for n != 2 k

genfft: Unparsing Schedule unparsed to C Pipeline usage managed by C compiler genfft + C compiler: performance problems  egcs “optimizer”

Conclusion & future work FFTW: The best of the best of the best… Over 100 downloads every week! genfft: specialized for linear functions  Crystallographic FT  FIR & IIR filters  Image processing (JPEG discrete cosine transform)