Function Evaluation Using Tables and Small Multipliers CS252A, Spring 2005 Jason Fong.

Slides:



Advertisements
Similar presentations
Zhongkai Chen. Gonzalez-Navarro, S. ; Tsen, C. ; Schulte, M. ; Univ. of Malaga, Malaga This paper appears in: Signals, Systems and Computers, ACSSC.
Advertisements

Presenter MaxAcademy Lecture Series – V1.0, September 2011 Elementary Functions.
Programmable FIR Filter Design
1 ECE 4436ECE 5367 Computer Arithmetic I-II. 2 ECE 4436ECE 5367 Addition concepts 1 bit adder –2 inputs for the operands. –Third input – carry in from.
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Chapter 9 Computer Design Basics. 9-2 Datapaths Reminding A digital system (or a simple computer) contains datapath unit and control unit. Datapath: A.
ECE 645 – Computer Arithmetic Lecture 11: Advanced Topics and Final Review ECE 645—Computer Arithmetic 4/22/08.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.
CENG536 Computer Engineering Department Çankaya University.
Arithmetic Operations and Circuits
UNIVERSITY OF MASSACHUSETTS Dept
EE 382 Processor DesignWinter 98/99Michael Flynn 1 AT Arithmetic Most concern has gone into creating fast implementation of (especially) FP Arith. Under.
Copyright 2008 Koren ECE666/Koren Part.9b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Copyright 2008 Koren ECE666/Koren Sample Mid-term 2.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital.
UNIVERSITY OF MASSACHUSETTS Dept
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Digital Kommunikationselektronik TNE027 Lecture 3 1 Multiply-Accumulator (MAC) Compute Sum of Product (SOP) Linear convolution y[n] = f[n]*x[n] = Σ f[k]
Energy and Delay Improvement via Decimal Floating Point Hossam A.H.Fahmy, Electronics and Communications Department, CairoUniversity Egypt and.
CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 8: Division.
Distributed Arithmetic: Implementations and Applications
A Parameterized Floating Point Library Applied to Multispectral Image Clustering Xiaojun Wang Dr. Miriam Leeser Rapid Prototyping Laboratory Northeastern.
GPGPU platforms GP - General Purpose computation using GPU
3-1 Chapter 3 - Arithmetic Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring Computer Architecture.
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Block Permutations in Boolean Space to Minimize TCAM for Packet Classification Authors: Rihua Wei, Yang Xu, H. Jonathan Chao Publisher: IEEE INFOCOM,2012.
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Adders and Multipliers Review. ARITHMETIC CIRCUITS Is a combinational circuit that performs arithmetic operations, e.g. –Addition –Subtraction –Multiplication.
3-1 Chapter 3 - Arithmetic Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles of Computer Architecture.
Floating Point vs. Fixed Point for FPGA 1. Applications Digital Signal Processing -Encoders/Decoders -Compression -Encryption Control -Automotive/Aerospace.
Variable Precision Floating Point Division and Square Root Albert Conti Xiaojun Wang Dr. Miriam Leeser Rapid Prototyping Laboratory Northeastern University,
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
Chapter # 5: Arithmetic Circuits
Reconfigurable Computing - Multipliers: Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on.
HCL and ALU תרגול 10. Overview of Logic Design Fundamental Hardware Requirements – Communication: How to get values from one place to another – Computation.
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
Abdullah Aldahami ( ) March 12, Introduction 2. Background 3. Proposed Multiplier Design a.System Overview b.Fixed Point Multiplier.
Datapath Designs CK Cheng CSE Department UC, San Diego.
J. Christiansen, CERN - EP/MIC
Lecture 4 Multiplier using FPGA 2007/09/28 Prof. C.M. Kyung.
AMIN FARMAHININ-FARAHANI CHARLES TSEN KATHERINE COMPTON FPGA Implementation of a 64-bit BID-Based Decimal Floating Point Adder/Subtractor.
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
June 2007 Computer Arithmetic, Function EvaluationSlide 1 VI Function Evaluation Topics in This Part Chapter 21 Square-Rooting Methods Chapter 22 The CORDIC.
Introduction to structured VLSI Projects 4 and 5 Rakesh Gangarajaiah
FPGA-Based System Design: Chapter 4 Copyright  2003 Prentice Hall PTR Topics n Number representation. n Shifters. n Adders and ALUs.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
1 CS 151: Digital Design Chapter 4: Arithmetic Functions and Circuits 4-1,2: Iterative Combinational Circuits and Binary Adders.
Company LOGO Final presentation Spring 2008/9 Performed by: Alexander PavlovDavid Domb Supervisor: Mony Orbach GPS/INS Computing System.
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA Project Guide: Smt. Latha Dept of E & C JSSATE, Bangalore. From: N GURURAJ M-Tech,
Digital Design Module –II Adders Amit Kumar Assistant Professor SCSE, Galgotias University, Greater Noida.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
Computer Architecture Lecture 11 Arithmetic Ralph Grishman Oct NYU.
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
Distortion Correction ECE 6276 Project Review Team 5: Basit Memon Foti Kacani Jason Haedt Jin Joo Lee Peter Karasev.
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
FPGA BASED REAL TIME VIDEO PROCESSING Characterization presentation Presented by: Roman Kofman Sergey Kleyman Supervisor: Mike Sumszyk.
CS151 Introduction to Digital Design Chapter 4: Arithmetic Functions and HDLs 4-1: Iterative Combinational Circuits 4-2: Binary Adders 1Created by: Ms.Amany.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
C OMBINATIONAL L OGIC D ESIGN 1 Eng.Maha AlGubali.
Explain Half Adder and Full Adder with Truth Table.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
Outline Introduction Floating Point Arithmetic Adder Multiplier.
Week 7: Gates and Circuits: PART II
Programmable Configurations
A Case for Table-Based Approximate Computing
UNIVERSITY OF MASSACHUSETTS Dept
Design of Digital Circuits Lab 5 Supplement: Implementing an ALU
CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
Presentation transcript:

Function Evaluation Using Tables and Small Multipliers CS252A, Spring 2005 Jason Fong

Overview Want to obtain values of elementary functions sin(x), cos(x), e x Full lookup table would be too large Bipartite and multipartite tables Split into multiple smaller tables and add values to obtain an approximation

Table Method With Small Multipliers Similar to multipartite method Approximate using 5 th order Taylor expansion Use a set of smaller tables and some small multipliers Better precision for same amount of hardware when compared to bipartite and multipartite methods

Taylor Series Approximates the value of f(x) near x = a More terms give a better approximation But not directly applicable for table values

Making a Taylor Series Useful Split n-bit input x into x 0, x 1, x 2, x 3, x 4 x 0, x 1, x 2, x 3 are k-bits wide x 4 is p-bits wide 4k+p = n p < k Use first 5 terms, and set a = x 0 Rearrange terms into groups that depend on only two parts of x Reduces possible values for each group Reduces number of rows in a groups table of values

Resulting Formula Each term depends on only two parts of x Compute all possible values of each term and create a lookup table with those values Lookup table row number obtained by concatenating input values Some terms require small multiplications Add together all terms to get the function value

Input Restrictions x is in a fixed-point format x is in the range [0,1) Range reductions common in approximation methods Apply transformation to reduce range of input Obtain approximation Apply another transformation to obtain final value

Block Diagram

Area Reduction in Tables n = 23, k = 5, p = 3 Full lookup table: 2 n entries, each 4k+p bits ~8 million rows Smaller tables: 2 2k entries of 4k+p+g bits (Table A) 2 2k entries of 2k+p+g bits (Table B) 2 x 2 2k entries of k+p+g bits (Tables C and E) 2 p+k entries of p+g bits (Table D) ~5000 rows

Multipliers Two small multipliers: k x k+p+g k x p+g One operand less than ¼ size of input precision Modern FPGAs include small multipliers

Implementation Java program calculates values of tables Function evaluator implemented using Altera Quartus II Size and delay measurements for Altera Stratix II FPGA

Building Table Values Java program generates Verilog code implementing each lookup table Iterate through each combination of (x 0,x 1 ), (x 0, x 2 ), etc. and calculate the corresponding value of the table Check correctness by iterating through all values of x and comparing with functions real value

Guard Bits Can find worse-case number of guard bits required based on logic structure May not actually need all the guard bits Adjust guard bit value and find minimum needed for a particular function

Results Synthesized for an Altera Stratix II ALUTs 96 DSP blocks (used as multipliers) f(x) = e x, n= ALUTs (17%) 4 DSP blocks (4%) 23 ns delay

In Comparison... FunctionALUTsDSPsDelay(ns) e x, n= e x, n= sin(x), n= adder, n= adder, n=141709

Possible Improvements Optimize final adder Currently using a generic parallel adder Not all operands are the same width Can optimize by making a custom adder Merge multiplications into the final adder Move partial product arrays into the adder Change splitting of the x input Improves table size More complicated formulas for table values

References D. Defour, F de Dinechin, and J.-M. Muller, "A New Scheme fo Table-Based Evaluation of Functions," Proc. 36th Asilomar Conf. Signals, Systems, and Computers, Nov F. de Dinechin, A. Tisserand, "Multipartite Table Methods," IEEE Transactions on Computers, March 2005 M. Ercegovac, T. Lang, Digital Arithmetic, Ch. 10