Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Kyungtae Han Ph.D. Defense Committee.

Slides:

Advertisements

Similar presentations

FINITE WORD LENGTH EFFECTS

Advertisements

EET260 Introduction to digital communication

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.

Programmable FIR Filter Design

1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.

UNIVERSITY OF MASSACHUSETTS Dept

Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.

Representation and Conversion of Numeric Types 4 We have seen multiple data types that C provides for numbers: int and double 4 What differences are there.

Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Prof. Brian L. Evans Embedded Signal.

Computer Arithmetic Integers: signed / unsigned (can overflow) Fixed point (can overflow) Floating point (can overflow, underflow) (Boolean / Character)

Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.

Prepared by: Hind J. Zourob Heba M. Matter Supervisor: Dr. Hatem El-Aydi Faculty Of Engineering Communications & Control Engineering.

- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”

A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.

Data Representation – Binary Numbers

03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.

Chapter 6-2 Multiplier Multiplier Next Lecture Divider

Power Reduction for FPGA using Multiple Vdd/Vth

Floating-point to fixed-point code conversion with variable trade-off between computational complexity and accuracy loss Alexandru Bârleanu, Vadim Băitoiu.

Floating Point vs. Fixed Point for FPGA 1. Applications Digital Signal Processing -Encoders/Decoders -Compression -Encryption Control -Automotive/Aerospace.

07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.

Decimal Multiplier on FPGA using Embedded Binary Multipliers Authors: H. Neto and M. Vestias Conference: Field Programmable Logic and Applications (FPL),

Reconfigurable Computing - Multipliers: Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on.

HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.

Low-Power Multipliers with Data Wordlength Reduction Kyungtae Han Brian L. Evans Earl E. Swartzlander, Jr.

Ch.5 Fixed-Point vs. Floating Point. 5.1 Q-format Number Representation on Fixed-Point DSPs 2’s Complement Number –B = b N-1 …b 1 b 0 –Decimal Value D.

ESPL 1 Wordlength Optimization with Complexity-and-Distortion Measure and Its Application to Broadband Wireless Demodulator Design Kyungtae Han and Brian.

Fixed & Floating Number Format Dr. Hugh Blanton ENTC 4337/5337.

Kanpur Genetic Algorithms Laboratory IIT Kanpur 25, July 2006 (11:00 AM) Multi-Objective Dynamic Optimization using Evolutionary Algorithms by Udaya Bhaskara.

Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.

Data Word Length Reduction for Low- Power DSP Software Kyungtae Han March 24, 2004.

Digital Signal Processor HANYANG UNIVERSITY 학기 Digital Signal Processor 조 성 호 교수님 담당조교 : 임대현

Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.

MATH Lesson 2 Binary arithmetic.

Introduction to the FPGA and Labs

Fang Fang James C. Hoe Markus Püschel Smarahara Misra

An Evolutionary Approach

“An Automated System for Floating- to Fixed-Point Conversion of High Performance of MATLAB Algorithms in FPGAs and ASICs” Eric Cigan and Robert Anderson.

Multiplier Design [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]

EEE4176 Applications of Digital Signal Processing

UNIVERSITY OF MASSACHUSETTS Dept

DESIGN AND IMPLEMENTATION OF DIGITAL FILTER

Topic 3d Representation of Real Numbers

Matlab as a Development Environment for FPGA Design

Matlab as a Design Environment for Wireless ASIC Design

Approximate Fully Connected Neural Network Generation

Artificial Intelligence Chapter 4. Machine Evolution

Multiplier-less Multiplication by Constants

Circuit Design Techniques for Low Power DSPs

The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.

On-line arithmetic for detection in digital communication receivers

Digital Systems and Binary Numbers

Post-Silicon Calibration for Large-Volume Products

University of Texas at Austin

Artificial Intelligence Chapter 4. Machine Evolution

UNIVERSITY OF MASSACHUSETTS Dept

UNIVERSITY OF MASSACHUSETTS Dept

UNIVERSITY OF MASSACHUSETTS Dept

Data Wordlength Reduction for Low-Power Signal Processing Software

C Model Sim (Fixed-Point) -A New Approach to Pipeline FFT Processor

UNIVERSITY OF MASSACHUSETTS Dept

Automatic Floating-Point to Fixed-Point Transformations

UNIVERSITY OF MASSACHUSETTS Dept

Low Power Digital Design

Topic 3d Representation of Real Numbers

M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University

Speaker: Chris Chen Advisor: Prof. An-Yeu Wu Date: 2014/10/28

Beyond Classical Search

On-line arithmetic for detection in digital communication receivers

UNIVERSITY OF MASSACHUSETTS Dept

Presentation transcript:

Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Kyungtae Han Ph.D. Defense Committee Members: Prof. Ross Baldick (Dept. of ECE) Prof. Brian L. Evans (Dept. of ECE), advisor Prof. Margarida F. Jacome (Dept. of ECE) Prof. Earl E. Swartzlander (Dept. of ECE) Prof. Robert A. van de Geijn (Dept. of CS) Computer Engineering Curriculum Track Dept. of Electrical and Computer Engineering The University of Texas at Austin May 9th, 2006

Outline Introduction Background Contributions Optimize fixed-point wordlengths Reduce power consumption in arithmetic Automate transformations of systems Conclusion

Implementing Digital Signal Processing Algorithms Introduction Implementing Digital Signal Processing Algorithms Hardware Price Power* Floating- Point Processor $ Floating-Point Program Code Conversion H L Digital Signal Processing Algorithms Fixed- Point Processor Fixed Point (Uniform Wordlength) $ Wordlength Optimization L H Fixed- Point ASIC Fixed Point (Optimized Wordlength) $ L H ASIC: Application Specific Integrated Circuit * Power consumption

Transformations to Fixed Point Introduction Transformations to Fixed Point Advantages Lower hardware complexity Lower power consumption Faster speed in processing Disadvantages Introduces distortion due to quantization error Search for optimum wordlength by trial & error is time-consuming Research goals Automate transformations to fixed point Control distortion vs. complexity tradeoffs Floating-Point Program Code Conversion Transformation Wordlength Optimization Fixed Point (Optimized Wordlength)

Outline Introduction Background Contributions Optimize fixed-point wordlengths Reduce power consumption in arithmetic Automate transformations of systems Conclusion

Fixed-Point Data Format Background Fixed-Point Data Format Integer wordlength (IWL) Number of bits assigned to integer representation Fractional wordlength (FWL) Number of bits assigned to fraction Wordlength (WL) SystemC format www.systemc.org S X Wordlength Integer wordlength Fractional (Binary point) π = 3.14159…(10) [Floating Point] 3.140625(10) = 011.001001(2) [WL=9; IWL=3; FWL=6] 3.141479492(10) = 011.00100100001110(2) [WL=16; IWL=3; FWL=13]

Distortion vs. Complexity Tradeoffs Background Distortion vs. Complexity Tradeoffs Shorter wordlength may increase application distortion and decrease implementation complexity Application distortion d(w) c(w) Implementation cost function Cmax Constant for maximum implementation cost d(w) Application distortion function Dmax Constant for maximum application distortion Wordlength lower bounds Wordlength upper bounds Feasible region Optimal tradeoff curve Implementation complexity c(w) Minimize implementation cost Minimize application distortion

Wordlength Optimization Constraints Background Wordlength Optimization Constraints Distortion constraint Complexity constraint Application-specific distortion d(w) Application-specific distortion d(w) Dmax Cmax Implementation Complexity c(w) Implementation Complexity c(w) Enforcing both constraints bounds the search to a finite area region

Wordlength Optimization Background Wordlength Optimization Wordlengths of signals (variables) in digital system as vector Single objective optimization Multiple objective optimization

Genetic Algorithm Evolutionary algorithm Inspired by Holland 1975 Background Genetic Algorithm New Gene Pool Function Evaluation Mutation Selection Mating Child Genes Parental Genes w/ Measure Evolutionary algorithm Inspired by Holland 1975 Mimic processes of plant and animal evolution Find optimum of a complex function [From Greg Rohling’s Ph.D Defense 2004]

Background Pareto Optimality Pareto optimality: “best that could be achieved without disadvantaging at least one group” [Allan Schick 1970] Pareto optimal set is set of nondominated solutions E is dominated by C as all objectives for C are less than corresponding objectives for E Solutions A, B, C, D are nondominated (not dominated by any solution) Pareto front is boundary (tradeoff curve) that connects Pareto optimal set solutions Pareto Front I A G Objective 2 H B E C F D Objective 1 : Nondominated : Dominated

Outline Introduction Background Contributions Optimize fixed-point wordlengths Reduce power consumption in arithmetic Automate transformations of systems Conclusion

Search for Optimum Wordlength Contribution #1 Search for Optimum Wordlength Complete search Search whole space Impractical in systems with many variables Gradient-based search Utilizes gradient information to determine next candidates Complexity measure (CM) [Sung and Kum, 1995] Distortion measure (DM) [Han et al., 2001] Complexity-and-distortion measure (CDM) [Han and Evans, 2004] Guided random search Genetic algorithm for single objective [Leban and Tasic, 2000] Multiple objective genetic algorithm Proposed Proposed

Complexity-and-Distortion Measure Contribution #1 Complexity-and-Distortion Measure Weighted combination of measures Single objective function: Gradient-based search Initialization Iterative greedy search based on complexity and distortion gradient information c(w) Complexity function d(w) Distortion function Dmax Constant for maximum distortion Cmax Constant for maximum complexity

Case Study: Filter Design Contribution #1 Case Study: Filter Design Infinite impulse response (IIR) filter Complexity measure: Area model of field-programmable gate array (FPGA) [Constantinides, Cheung, and Luk 2003] Distortion measure: Root mean square (RMS) error Seven fixed-point variables (indicated by slashes) Delay b0 b1 -a1 x[n] y[n]

Case Study: Gradient-Based Search Contribution #1 Case Study: Gradient-Based Search CDM could lead to lower complexity and lower number of simulations compared to DM and CM Search Method Gradient Measure Number of Simulations Complexity Estimate (LUT) Distortion (RMS)* Complete DM CDM CM - 316 145 417 167 ** 51.05 49.85 51.95 0.0981 0.0992 0.0986 * Maximum distortion measured by root mean square (RMS) error is 0.1 ** 167 = 268,435,456 (8.5 years, if 1 second per 1 simulation)

Case Study: Genetic Algorithm Contribution #1 Case Study: Genetic Algorithm Search Pareto optimal set (nondominated) Handles multiple objectives: Error and Area Pareto Front 9,000 simulations 22,500 simulations 45,000 simulations 100th Generation 250th Generation 500th Generation * Population for one generation: 90 LUT: Lookup table

Case Study: Comparison Contribution #1 Case Study: Comparison Superpose gradient-based search (GS) results on GA results 50th Generation (4500 simulations) 500th Generation (45000 simulations) * Required RMSmax for gradient-based search are Dmax {0.12, 0.1, 0.08} GS methods can get stuck in a local minimum GS methods reduce running time (CDM: 145 simulations)

Comparison of Proposed Methods Contribution #1 Comparison of Proposed Methods Gradient-based search Genetic algorithm Type of Solution One point Family of points Tradeoff Curve Found No Yes Execution Time Short Long Amount of Computation Low High Parallelism

Outline Introduction Background Contributions Optimize fixed-point wordlengths Reduce power consumption in arithmetic Automate transformations of systems Conclusion

Lower Power Consumption in DSP Contribution #2 Lower Power Consumption in DSP Minimize power dissipation due to limited battery power and cooling system Multipliers often a major source of dynamic power consumption in typical DSP applications Multi-precision multipliers can select smaller multipliers (8, 16 or 24 bits) to reduce power consumption Wordlength reduction to select any word size [Han, Evans, and Swartzlander 2004] Proposed

Wordlength Reduction in Multiplication Contribution #2 Wordlength Reduction in Multiplication Input data wordlength reduction Smaller bits enough to represent, e.g. π x π ≈ 9 Truncation Signed right shift Move toward the least significant bit (LSB) Signed bit extended for arithmetic right shift Sign bit

Power Reduction via Wordlength Reduction Contribution #2 Power Reduction via Wordlength Reduction Power dissipation Switching power consumption Static power consumption Switching activity parameter, α Reduce α by wordlength reduction CL Load capacitance Vdd Operating voltage fclk Operating frequency Relationship between reduced wordlength and switching parameter α in power consumption?

Analytical Method Consider stream of data for one of the multiplicands Contribution #2 Analytical Method Consider stream of data for one of the multiplicands Compare two adjacent numbers in stream after reduction Expectation of bit switching, x, with probability Px L-bit input data Truncate input data to M bits (N bits are removed) N-bit signed right shift in L-bit input (Y is sign bit) S … L bits M bits N bits

Analytical Method … Wordlength (L) = 16 Input Switching expectation Contribution #2 Analytical Method S … L bits M bits N bits No Reduction Reduction Input Switching expectation Full length used L/2 Truncate N bits M/2 N-bit signed right shift Wordlength (L) = 16

Dynamic Power Consumption for Wallace Multiplier (1 MHz) Contribution #2 Dynamic Power Consumption for Wallace Multiplier (1 MHz) Reduction (56%) 16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA) Truncate 1st arg Truncate 2nd arg (recode,nonrecode) Truncation- First Truncation- Second Wallace multiplier used in TI 320C64 DSP

Swapping could have benefit Contribution #2 Dynamic Power Consumption for Radix-4 Modified Booth Multiplier (1 MHz) Sensitive (13%) Reduction (31%) 16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA) Truncate 1st arg Truncate 2nd arg (recode,nonrecode) Swapping could have benefit Radix-4 modified Booth multiplier used in TI 320C62 DSP

Summary of Contribution #2 Truncation to 8 bits reduces est. power consumption by 56% in Wallace and 31% in Booth 16-bit multipliers Signed right shift exhibits no est. power reduction in Wallace multiplier (for any shift) and 25% reduction in Booth multipliers (for 8-bit shift) Power consumption in tree-based multiplier Highly depends on input data Simulation of all switching activity matches analysis of switching activity in reduced multiplicands in Wallace mult. Operand swapping can reduce power consumption In Booth multiplier, non-recoded operand 13% more sensitive in power consumption

Outline Introduction Background Contributions Optimize fixed-point wordlengths Reduce power consumption in arithmetic Automate transformations of systems Conclusion

Automating Transformations from Floating Point to Fixed Point Contribution #3 Automating Transformations from Floating Point to Fixed Point Existing fixed-point tools Support fixed-point simulation Convert floating-point code to raw fixed-point code Manually find optimum wordlength by trial and error Automating transformations Fully automate conversion and wordlength optimization process (Proposed) SNU gFix, Autoscaler CoWare SPW HDS Synopsys CoCentric MATLAB Fixed-point toolbox MATLAB Fixed-point blockset AccelChip DSP synthesis Catalytic RMS, MCS Fixed-point tools Floating-Point Program Code Conversion Wordlength Optimization Wordlength-Optimized Fixed-Point Program

Automatic Transformation Flow Contribution #3 Automatic Transformation Flow Code generation Parse floating-point program Generate a raw fixed-point program and auxiliary programs (top, objective, cost, etc.) Range estimation Estimate range to avoid overflow (Analytical/Simulation) Determine integer wordlength (IWL) Wordlength optimization Optimize wordlength according to given input, and error specification (Analytical/Simulation) Determine fractional wordlength (FWL) Code Generation Range Estimation Wordlength Optimization

Code Generation for Fixed-Point Program Contribution #3 Code Generation for Fixed-Point Program Adder function in MATLAB Function [c] = adder_fx(a, b) c = 0; a = fi (a, 1,32,16); b = fi (b, 1,32,16); c = fi (c, 1,32,16); c(:) = a + b; Function [c] = adder(a, b) c = 0; c = a + b; Determined by designers with trial and error (a) Floating point program for adder (b) Raw fixed-point program Function [c] = adder_fx(a, b, numtype) c = 0; a = fi (a, numtype.a); b = fi (b, numtype.b); c = fi (c, numtype.c); c(:) = a + b; WL S FWL fi(a, S,WL,FWL) is a constructor function for a fixed-point object in fixed-point toolbox [S: Signed, WL: Wordlength, FWL: Fraction length] (c) Converted fixed-point program for automating optimization (Proposed)

Automating Transformation Environment for Wordlength Optimization Contribution #3 Automating Transformation Environment for Wordlength Optimization Input Data Top Program Floating-Point Program Optimum Wordlength Evaluation Program (Objectives) Search Engine Fixed-Point Program Gradient-based or Genetic algorithm Range Estimation Complexity Estimation Error Estimation Given floating-point program and options, auxiliary programs are automatically generated Given input data, optimum wordlength is searched

Demo of Released Software Contribution #3 Demo of Released Software

Conclusion Search for optimum wordlength Gradient-based search reduces execution time with complexity-and-distortion measure method while solutions could be trapped in local optimum Genetic algorithm can find distortion vs. complexity tradeoff curve, but it requires longer execution time Reduce power consumption by data wordlength reduction of multiplicands Automate transformations from floating-point programs to fixed-point programs Free software release is available at www.ece.utexas.edu/~bevans/projects/wordlength/converter/

End Thank you!