Floating-point to fixed-point code conversion with variable trade-off between computational complexity and accuracy loss Alexandru Bârleanu, Vadim Băitoiu.

Slides:



Advertisements
Similar presentations
FINITE WORD LENGTH EFFECTS
Advertisements

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Microcomputer Systems 1
Algorithms + L. Grewe.
© 2003 Xilinx, Inc. All Rights Reserved Looking Under the Hood.
CENG536 Computer Engineering Department Çankaya University.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.
Digital Kommunikationselektronik TNE027 Lecture 4 1 Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic analog.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Carnegie Mellon Adaptive Mapping of Linear DSP Algorithms to Fixed-Point Arithmetic Lawrence J. Chang Inpyo Hong Yevgen Voronenko Markus Püschel Department.
A Low-Power Low-Memory Real-Time ASR System. Outline Overview of Automatic Speech Recognition (ASR) systems Sub-vector clustering and parameter quantization.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
GPGPU platforms GP - General Purpose computation using GPU
Prepared by: Hind J. Zourob Heba M. Matter Supervisor: Dr. Hatem El-Aydi Faculty Of Engineering Communications & Control Engineering.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
ENGR. SHOAIB ASLAM Computer Programming I Lecture 02.
Digital Signal Processors for Real-Time Embedded Systems By Jeremy Kohel.
Development. Development Environment Editor Assembler or compiler Embedded emulator/debugger IAR Embedded Workbench Kickstart Code Composer Essentials.
ELEN 5346/4304 DSP and Filter Design Fall Lecture 12: Number representation and Quantization effects Instructor: Dr. Gleb V. Tcheslavski Contact:
Algorithm Taxonomy Thus far we have focused on:
Introduction to Adaptive Digital Filters Algorithms
Computer Arithmetic Nizamettin AYDIN
Automatic Identification of Concurrency in Handel-C Joseph C Libby, Kenneth B Kent, Farnaz Gharibian Faculty of Computer Science University of New Brunswick.
Fixed-Point Arithmetics: Part II
Floating Point vs. Fixed Point for FPGA 1. Applications Digital Signal Processing -Encoders/Decoders -Compression -Encryption Control -Automotive/Aerospace.
DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
8-1 Embedded Systems Fixed-Point Math and Other Optimizations.
ECE 8053 Introduction to Computer Arithmetic (Website: Course & Text Content: Part 1: Number Representation.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
 Embedded Digital Signal Processing (DSP) systems  Specification with floating-point data types  Implementation in fixed-point architectures  Precision.
CPS120: Introduction to Computer Science Operations Lecture 9.
Ch.5 Fixed-Point vs. Floating Point. 5.1 Q-format Number Representation on Fixed-Point DSPs 2’s Complement Number –B = b N-1 …b 1 b 0 –Decimal Value D.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
ESPL 1 Wordlength Optimization with Complexity-and-Distortion Measure and Its Application to Broadband Wireless Demodulator Design Kyungtae Han and Brian.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Fixed & Floating Number Format Dr. Hugh Blanton ENTC 4337/5337.
Safe RTL Annotations for Low Power Microprocessor Design Vinod Viswanath Department of Electrical and Computer Engineering University of Texas at Austin.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Automatic Evaluation of the Accuracy of Fixed-point Algorithms Daniel MENARD 1, Olivier SENTIEYS 1,2 1 LASTI, University of Rennes 1 Lannion, FRANCE 2.
1 6-Performance Analysis of Embedded System Designs: Digital Camera Case Study (cont.)
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
ECE DIGITAL LOGIC LECTURE 15: COMBINATIONAL CIRCUITS Assistant Prof. Fareena Saqib Florida Institute of Technology Fall 2015, 10/20/2015.
Data Word Length Reduction for Low- Power DSP Software Kyungtae Han March 24, 2004.
ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.
Introduction to Algorithmic Processes CMPSC 201C Fall 2000.
Number Systems. The position of each digit in a weighted number system is assigned a weight based on the base or radix of the system. The radix of decimal.
Digital Signal Processor HANYANG UNIVERSITY 학기 Digital Signal Processor 조 성 호 교수님 담당조교 : 임대현
Application of digital filter in engineering
FOURIER ANALYSIS TECHNIQUES Fourier series permit the extension of steady state analysis to general periodic signal. FOURIER SERIES LEARNING GOALS FOURIER.
Floating Point Arithmetic – Part I
Advanced Algorithms Analysis and Design
LOW-COMPLEXITY ARBITRARY SAMPLE-RATE CONVERTER
Embedded Systems Design
Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Kyungtae Han Ph.D. Defense Committee.
CSCI1600: Embedded and Real Time Software
Performance Optimization for Embedded Software
Objective of This Course
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
Multiplier-less Multiplication by Constants
Automatic Floating-Point to Fixed-Point Transformations
Fixed-point Analysis of Digital Filters
CSCI1600: Embedded and Real Time Software
Research: Past, Present and Future
Presentation transcript:

Floating-point to fixed-point code conversion with variable trade-off between computational complexity and accuracy loss Alexandru Bârleanu, Vadim Băitoiu and Andrei Stan Technical University “Gh. Asachi”, Iaşi, Romania 15 th International Conference on System Theory, Control and Computing (Joint conference of SINTES15, SACCS11, SIMSIS15) October 14-16, 2011 Sinaia, ROMANIA 1/13

Motivation Embedded microprocessors: –No hardware dedicated to floating-point –Limited processing capabilities Emulated floating-point arithmetic: –Unnecessary high accuracy –Long execution time Fixed-point code written manually: –Error-prone –Important accuracy loss 2/13

Existing work For FPGA –The main problem is fractional word-length optimization –The search space grows exponentially with the number of fixed-point variables –Search techniques (often sophisticated) are necessary: Greedy algorithms Genetic algorithms Simulated annealing –Optimization objectives: accuracy loss, area For microcontrollers, C language –Existing solutions: Fixed-point format is supplied by the user (in annotations, for example) Fixed-point format is determined through simulations, taking into consideration for example some accuracy constraints –Available integer types types in C: only 16/32/64-bit signed/unsigned –Optimization objectives: accuracy loss, number of (scaling) operations 3/13

Problem formulation The problem is constructed from practical considerations: Input – a digital filter: –Filter structure: Direct-Form I –Constant floating-point coefficients –Known input bounds (low/high values) Output – ANSI-C integer code: –ideally the result must be the same as if floating-point code would have been used 4/13

Building the dataflow Initial state – very long fractional parts –Multiply operators overflow –Add operators have unaligned terms Changing the dataflow – making nodes representable in C –Resolving overflows in any operator –Aligning summation terms Recursive method calls – bottom-up action 5/13 Run-time integer interval: [0; ] Fractional word-length: 27 Datatype: none (using only 16/32 bit integers) Floating-point interval: [0; ] Run-time integer interval: [0; ] Fractional word-length: 26 Datatype: unsigned long Floating-point interval: [0; ] Example: making node run-time integer interval smaller (scaling)

Dataflow transformation philosophy At design-time (scaling coefficients) At run-time (scaling operators) Loss of accuracylarge, because scaling occurs at dataflow sources small, because scaling occurs close to dataflow root Run-time operations0>0 Overflow avoidance (not optional!) Run-time integer interval reduction (together with FWL) Discarding of least significant bits (multiple ways) 6/13

Selecting the optimal dataflow transformation Size of error interval Number of operators Increase or decrease node run-time integer interval Construct multiple dataflow transformation variants (alternative dataflow fragments) Compare candidate dataflow transformation variants using a linear cost function Analitycally computed values Number of cycles SQNR loss, error distribution... Ideal values 7/13

Varying the cost function coefficients (example) dB dB dB dB Filter Response type: bandpass Type: FIR Order: 40 Target/Compilation Processor: ARM Cortex-M3 Compiler: IAR C/C for ARM (Kickstart) Optimizations: medium SQNR loss Time (cycles) For comparison – the floating-point code takes cycles 4 dataflows shown from 18 total found 8/13

Implementation insights Language: Java SE 1.6 Techniques: OOP, polymorphism Analitycal estimation of run-time integer intervals, dataflow complexity, and node error intervals Dataflows are transformed using Change instances (not by copying large dataflow portions and modifying them). – Change instances are invertible (apply/undo) – Change instances can be combined in logical AND and OR Dataflow vizualization: dot (graph description language) 9/13

Usage example Filter properties Response type: highpass Type: FIR Order: 30 Designed with: Matlab FDATool Conversion information Number of dataflows produced by varying the cost function coefficients: 158 (18 different) Total transformation time: 2.44s Performance of fixed-point function #7 Distortion (SQNR loss): 3.1e-05dB Speed test: Device: MSP430F149 Compiler: IAR 5.10 (Kickstart) Compiler opt.: High speed Factor: /13

Testing AccuracySpeed CompilerMicrosoft C++ (Visual Studio 2010) IAR, gcc Compiler settingsOptimizations: disabled / enabled (low, high,...) Processor variant8-bit (AVR) 16-bit (MSP430) 32-bit (ARM7 Cortex-M3) Filter propertiesType: FIR, IIR (work in progress) Order: 4-80 (FIR) Input interval: [0; 4095], [-4096; 4095], and other Design method: random coefficients, Matlab FDATool Cost functionFrom „low-complexity-low-accuracy” to „high-complexity- high-accuracy” Code generationFrom „everything in one expression” (inline) to „every operator variable declared” 11/13

Results 12/13 Number of cycles Speed factor: (or more if compiler optimizations are applied) Accuracy loss SQNR loss: 1e-5...1e-1 dB Floating-point code Variable trade-off between complexity and accuracy Constant execution time (no jitter – more determinism)

Conclusions An innovative floating-point to fixed-point conversion method for C language is proposed: –Very good speed factor is obained (integer code compared with floating- point code). –Very good accuracy is obtained for FIR filters. –The conversion algorithm is designed to use variable cost functions. It is possible to specify, for example, that complexity is important and accuracy loss is unimportant when building the integer dataflow. –The conversion time is very short. This happens because: Dataflow metrics are estimated analytically Dataflow nodes have cache information (run-time integer interval, error interval) The automatic search of dataflows algorithm uses a heuristic to generate as few as possible identical dataflows 13/13