Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Kyungtae Han Ph.D. Defense Committee.

Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Kyungtae Han Ph.D. Defense Committee Members: Prof. Ross Baldick (Dept. of ECE) Prof. Brian L. Evans (Dept. of ECE), advisor Prof. Margarida F. Jacome (Dept. of ECE) Prof. Earl E. Swartzlander (Dept. of ECE) Prof. Robert A. van de Geijn (Dept. of CS) Computer Engineering Curriculum Track Dept. of Electrical and Computer Engineering The University of Texas at Austin May 9th, 2006

Outline Introduction Background Contributions
Optimize fixed-point wordlengths Reduce power consumption in arithmetic Automate transformations of systems Conclusion

Implementing Digital Signal Processing Algorithms
Introduction Implementing Digital Signal Processing Algorithms Hardware Price Power* Floating- Point Processor $ Floating-Point Program Code Conversion H L Digital Signal Processing Algorithms Fixed- Point Processor Fixed Point (Uniform Wordlength) $ Wordlength Optimization L H Fixed- Point ASIC Fixed Point (Optimized Wordlength) $ L H ASIC: Application Specific Integrated Circuit * Power consumption

Transformations to Fixed Point
Introduction Transformations to Fixed Point Advantages Lower hardware complexity Lower power consumption Faster speed in processing Disadvantages Introduces distortion due to quantization error Search for optimum wordlength by trial & error is time-consuming Research goals Automate transformations to fixed point Control distortion vs. complexity tradeoffs Floating-Point Program Code Conversion Transformation Wordlength Optimization Fixed Point (Optimized Wordlength)

Fixed-Point Data Format
Background Fixed-Point Data Format Integer wordlength (IWL) Number of bits assigned to integer representation Fractional wordlength (FWL) Number of bits assigned to fraction Wordlength (WL) SystemC format S X Wordlength Integer wordlength Fractional (Binary point) π = …(10) [Floating Point] (10) = (2) [WL=9; IWL=3; FWL=6] (10) = (2) [WL=16; IWL=3; FWL=13]

Distortion vs. Complexity Tradeoffs
Background Distortion vs. Complexity Tradeoffs Shorter wordlength may increase application distortion and decrease implementation complexity Application distortion d(w) c(w) Implementation cost function Cmax Constant for maximum implementation cost d(w) Application distortion function Dmax Constant for maximum application distortion Wordlength lower bounds Wordlength upper bounds Feasible region Optimal tradeoff curve Implementation complexity c(w) Minimize implementation cost Minimize application distortion

Wordlength Optimization Constraints
Background Wordlength Optimization Constraints Distortion constraint Complexity constraint Application-specific distortion d(w) Application-specific distortion d(w) Dmax Cmax Implementation Complexity c(w) Implementation Complexity c(w) Enforcing both constraints bounds the search to a finite area region

Wordlength Optimization
Background Wordlength Optimization Wordlengths of signals (variables) in digital system as vector Single objective optimization Multiple objective optimization

Genetic Algorithm Evolutionary algorithm Inspired by Holland 1975
Background Genetic Algorithm New Gene Pool Function Evaluation Mutation Selection Mating Child Genes Parental Genes w/ Measure Evolutionary algorithm Inspired by Holland 1975 Mimic processes of plant and animal evolution Find optimum of a complex function [From Greg Rohling’s Ph.D Defense 2004]

Background Pareto Optimality Pareto optimality: “best that could be achieved without disadvantaging at least one group” [Allan Schick 1970] Pareto optimal set is set of nondominated solutions E is dominated by C as all objectives for C are less than corresponding objectives for E Solutions A, B, C, D are nondominated (not dominated by any solution) Pareto front is boundary (tradeoff curve) that connects Pareto optimal set solutions Pareto Front I A G Objective 2 H B E C F D Objective 1 : Nondominated : Dominated

Search for Optimum Wordlength
Contribution #1 Search for Optimum Wordlength Complete search Search whole space Impractical in systems with many variables Gradient-based search Utilizes gradient information to determine next candidates Complexity measure (CM) [Sung and Kum, 1995] Distortion measure (DM) [Han et al., 2001] Complexity-and-distortion measure (CDM) [Han and Evans, 2004] Guided random search Genetic algorithm for single objective [Leban and Tasic, 2000] Multiple objective genetic algorithm Proposed Proposed

Complexity-and-Distortion Measure
Contribution #1 Complexity-and-Distortion Measure Weighted combination of measures Single objective function: Gradient-based search Initialization Iterative greedy search based on complexity and distortion gradient information c(w) Complexity function d(w) Distortion function Dmax Constant for maximum distortion Cmax Constant for maximum complexity

Case Study: Filter Design
Contribution #1 Case Study: Filter Design Infinite impulse response (IIR) filter Complexity measure: Area model of field-programmable gate array (FPGA) [Constantinides, Cheung, and Luk 2003] Distortion measure: Root mean square (RMS) error Seven fixed-point variables (indicated by slashes) Delay b0 b1 -a1 x[n] y[n]

Case Study: Gradient-Based Search
Contribution #1 Case Study: Gradient-Based Search CDM could lead to lower complexity and lower number of simulations compared to DM and CM Search Method Gradient Measure Number of Simulations Complexity Estimate (LUT) Distortion (RMS)* Complete DM CDM CM - 316 145 417 167 ** 51.05 49.85 51.95 0.0981 0.0992 0.0986 * Maximum distortion measured by root mean square (RMS) error is 0.1 ** 167 = 268,435,456 (8.5 years, if 1 second per 1 simulation)

Case Study: Genetic Algorithm
Contribution #1 Case Study: Genetic Algorithm Search Pareto optimal set (nondominated) Handles multiple objectives: Error and Area Pareto Front 9,000 simulations 22,500 simulations 45,000 simulations 100th Generation 250th Generation 500th Generation * Population for one generation: 90 LUT: Lookup table

Case Study: Comparison
Contribution #1 Case Study: Comparison Superpose gradient-based search (GS) results on GA results 50th Generation (4500 simulations) 500th Generation (45000 simulations) * Required RMSmax for gradient-based search are Dmax {0.12, 0.1, 0.08} GS methods can get stuck in a local minimum GS methods reduce running time (CDM: 145 simulations)

Comparison of Proposed Methods
Contribution #1 Comparison of Proposed Methods Gradient-based search Genetic algorithm Type of Solution One point Family of points Tradeoff Curve Found No Yes Execution Time Short Long Amount of Computation Low High Parallelism

Lower Power Consumption in DSP
Contribution #2 Lower Power Consumption in DSP Minimize power dissipation due to limited battery power and cooling system Multipliers often a major source of dynamic power consumption in typical DSP applications Multi-precision multipliers can select smaller multipliers (8, 16 or 24 bits) to reduce power consumption Wordlength reduction to select any word size [Han, Evans, and Swartzlander 2004] Proposed

Wordlength Reduction in Multiplication
Contribution #2 Wordlength Reduction in Multiplication Input data wordlength reduction Smaller bits enough to represent, e.g. π x π ≈ 9 Truncation Signed right shift Move toward the least significant bit (LSB) Signed bit extended for arithmetic right shift Sign bit

Power Reduction via Wordlength Reduction
Contribution #2 Power Reduction via Wordlength Reduction Power dissipation Switching power consumption Static power consumption Switching activity parameter, α Reduce α by wordlength reduction CL Load capacitance Vdd Operating voltage fclk Operating frequency Relationship between reduced wordlength and switching parameter α in power consumption?

Analytical Method Consider stream of data for one of the multiplicands
Contribution #2 Analytical Method Consider stream of data for one of the multiplicands Compare two adjacent numbers in stream after reduction Expectation of bit switching, x, with probability Px L-bit input data Truncate input data to M bits (N bits are removed) N-bit signed right shift in L-bit input (Y is sign bit) S … L bits M bits N bits

Analytical Method … Wordlength (L) = 16 Input Switching expectation
Contribution #2 Analytical Method S … L bits M bits N bits No Reduction Reduction Input Switching expectation Full length used L/2 Truncate N bits M/2 N-bit signed right shift Wordlength (L) = 16

Dynamic Power Consumption for Wallace Multiplier (1 MHz)
Contribution #2 Dynamic Power Consumption for Wallace Multiplier (1 MHz) Reduction (56%) 16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA) Truncate 1st arg Truncate 2nd arg (recode,nonrecode) Truncation- First Truncation- Second Wallace multiplier used in TI 320C64 DSP

Swapping could have benefit
Contribution #2 Dynamic Power Consumption for Radix-4 Modified Booth Multiplier (1 MHz) Sensitive (13%) Reduction (31%) 16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA) Truncate 1st arg Truncate 2nd arg (recode,nonrecode) Swapping could have benefit Radix-4 modified Booth multiplier used in TI 320C62 DSP

Summary of Contribution #2
Truncation to 8 bits reduces est. power consumption by 56% in Wallace and 31% in Booth 16-bit multipliers Signed right shift exhibits no est. power reduction in Wallace multiplier (for any shift) and 25% reduction in Booth multipliers (for 8-bit shift) Power consumption in tree-based multiplier Highly depends on input data Simulation of all switching activity matches analysis of switching activity in reduced multiplicands in Wallace mult. Operand swapping can reduce power consumption In Booth multiplier, non-recoded operand 13% more sensitive in power consumption

Automating Transformations from Floating Point to Fixed Point
Contribution #3 Automating Transformations from Floating Point to Fixed Point Existing fixed-point tools Support fixed-point simulation Convert floating-point code to raw fixed-point code Manually find optimum wordlength by trial and error Automating transformations Fully automate conversion and wordlength optimization process (Proposed) SNU gFix, Autoscaler CoWare SPW HDS Synopsys CoCentric MATLAB Fixed-point toolbox MATLAB Fixed-point blockset AccelChip DSP synthesis Catalytic RMS, MCS Fixed-point tools Floating-Point Program Code Conversion Wordlength Optimization Wordlength-Optimized Fixed-Point Program

Automatic Transformation Flow
Contribution #3 Automatic Transformation Flow Code generation Parse floating-point program Generate a raw fixed-point program and auxiliary programs (top, objective, cost, etc.) Range estimation Estimate range to avoid overflow (Analytical/Simulation) Determine integer wordlength (IWL) Wordlength optimization Optimize wordlength according to given input, and error specification (Analytical/Simulation) Determine fractional wordlength (FWL) Code Generation Range Estimation Wordlength Optimization

Code Generation for Fixed-Point Program
Contribution #3 Code Generation for Fixed-Point Program Adder function in MATLAB Function [c] = adder_fx(a, b) c = 0; a = fi (a, 1,32,16); b = fi (b, 1,32,16); c = fi (c, 1,32,16); c(:) = a + b; Function [c] = adder(a, b) c = 0; c = a + b; Determined by designers with trial and error (a) Floating point program for adder (b) Raw fixed-point program Function [c] = adder_fx(a, b, numtype) c = 0; a = fi (a, numtype.a); b = fi (b, numtype.b); c = fi (c, numtype.c); c(:) = a + b; WL S FWL fi(a, S,WL,FWL) is a constructor function for a fixed-point object in fixed-point toolbox [S: Signed, WL: Wordlength, FWL: Fraction length] (c) Converted fixed-point program for automating optimization (Proposed)

Automating Transformation Environment for Wordlength Optimization
Contribution #3 Automating Transformation Environment for Wordlength Optimization Input Data Top Program Floating-Point Program Optimum Wordlength Evaluation Program (Objectives) Search Engine Fixed-Point Program Gradient-based or Genetic algorithm Range Estimation Complexity Estimation Error Estimation Given floating-point program and options, auxiliary programs are automatically generated Given input data, optimum wordlength is searched

Demo of Released Software
Contribution #3 Demo of Released Software

Conclusion Search for optimum wordlength
Gradient-based search reduces execution time with complexity-and-distortion measure method while solutions could be trapped in local optimum Genetic algorithm can find distortion vs. complexity tradeoff curve, but it requires longer execution time Reduce power consumption by data wordlength reduction of multiplicands Automate transformations from floating-point programs to fixed-point programs Free software release is available at

End Thank you!

Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Kyungtae Han Ph.D. Defense Committee.

Similar presentations

Presentation on theme: "Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Kyungtae Han Ph.D. Defense Committee."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Kyungtae Han Ph.D. Defense Committee.

Similar presentations

Presentation on theme: "Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Kyungtae Han Ph.D. Defense Committee."— Presentation transcript:

Similar presentations

About project

Feedback