A Fast Hardware Approach for Approximate, Efficient Logarithm and Anti-logarithm Computation Suganth Paul Nikhil Jayakumar Sunil P. Khatri Department of.

Slides:

Advertisements

Similar presentations

Zhongkai Chen. Gonzalez-Navarro, S. ; Tsen, C. ; Schulte, M. ; Univ. of Malaga, Malaga This paper appears in: Signals, Systems and Computers, ACSSC.

Advertisements

Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.

1 A HIGH THROUGHPUT PIPELINED ARCHITECTURE FOR H.264/AVC DEBLOCKING FILTER Kefalas Nikolaos, Theodoridis George VLSI Design Lab. Electrical & Computer.

CENG536 Computer Engineering department Çankaya University.

Distributed Arithmetic

Analog-to-Digital Converter (ADC) And

A Robust, Fast Pulsed Flip- Flop Design By: Arunprasad Venkatraman Rajesh Garg Sunil Khatri Department of Electrical and Computer Engineering, Texas A.

Dr. Subbarao Wunnava June 2006 “ Functional Microcontroller Design and Implementation ” Paper Authors : Vivekananda Jayaram Dr. Subbarao Wunnava Research.

CENG536 Computer Engineering Department Çankaya University.

High Speed Hardware Implementation of an H.264 Quantizer. Alex Braun Shruti Lakdawala.

Presenting: Itai Avron Supervisor: Chen Koren Final Presentation Spring 2005 Implementation of Artificial Intelligence System on FPGA.

Distributed Arithmetic: Implementations and Applications

Chapter 9 Numerical Integration Numerical Integration Application: Normal Distributions Copyright © The McGraw-Hill Companies, Inc. Permission required.

GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.

GPGPU platforms GP - General Purpose computation using GPU

Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

Scientific Notation

Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.

N 58 Graphical Solutions to Quadratic Functions Subject Content Reference: N6.7h GCSE Maths Number & Algebra.

Logarithms Tutorial to explain the nature of logarithms and their use in our courses.

 To add numbers in scientific notation: 1) Add the constants 2) Keep the exponent the same  Example: (2.1 x 10 5 ) + (3.2 x 10 5 ) = ( ) x 10.

Unrestricted Faithful Rounding is Good Enough for Some LNS Applications Mark Arnold Colin Walter University of Manchester Institute of Science and Technology.

Binary Real Numbers. Introduction Computers must be able to represent real numbers (numbers w/ fractions) Two different ways:  Fixed-point  Floating-point.

Chapter 2.2 Scientific Notation. Expresses numbers in two parts: A number between 1 and 10 Ten raised to a power Examples: 2.32 x x

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

DLS Digital Controller Tony Dobbing Head of Power Supplies Group.

Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.

ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.

AMIN FARMAHININ-FARAHANI CHARLES TSEN KATHERINE COMPTON FPGA Implementation of a 64-bit BID-Based Decimal Floating Point Adder/Subtractor.

Accelerating Statistical Static Timing Analysis Using Graphics Processing Units Kanupriya Gulati and Sunil P. Khatri Department of ECE, Texas A&M University,

J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.

Quick and Easy Binary to dB Conversion George Weistroffer, Jeremy Cooper, and Jerry Tucker Electrical and Computer Engineering Virginia Commonwealth University.

Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.

Warm ups 1. Write the equation in exponential form.

1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.

EE2174: Digital Logic and Lab Professor Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University CHAPTER 8 Arithmetic.

Implementing and Optimizing a Direct Digital Frequency Synthesizer on FPGA Jung Seob LEE Xiangning YANG.

Principles of Linear Pipelining. In pipelining, we divide a task into set of subtasks. The precedence relation of a set of subtasks {T 1, T 2,…, T k }

Chapter One Introduction to Pipelined Processors

Lms algorithm FOR NON-STATIONARY INPUTS FOR THE PIPELINED IMPLEMENTATION OF ADAPTIVE ANTENNAS Prof.Yu Hen Hu Arjun Arunachalam Department of Electrical.

Section 11-5 Common Logarithms. Logarithms with base 10 are called common logarithms. You can easily find the common logarithms of integral powers of.

16 Bit Logarithmic Converter Tinghao Liang and Sara Nadeau.

Exponents And how they make life easier! Exponents Exponents are used to write numbers in scientific notation. Exponents are powers of ten. 10 x 10 =

Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.

Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.

Speedup Speedup is defined as Speedup = Time taken for a given computation by a non-pipelined functional unit Time taken for the same computation by a.

Distortion Correction ECE 6276 Project Review Team 5: Basit Memon Foti Kacani Jason Haedt Jin Joo Lee Peter Karasev.

ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.

EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.

Chapter 6: Random Errors in Chemical Analysis. 6A The nature of random errors Random, or indeterminate, errors can never be totally eliminated and are.

 presented by- ARPIT GARG ISHU MISHRA KAJAL SINGHAL B.TECH(ECE) 3RD YEAR.

Hardware Descriptions of Multi-Layer Perceptions with Different Abstraction Levels Paper by E.M. Ortigosa , A. Canas, E.Ros, P.M. Ortigosa, S. Mota , J.

Backprojection Project Update January 2002

An Implementation Method of the Box Filter on FPGA

Math & Exponents.

And how they make life easier!

Aim: How do we do the operations of radical expressions?

Chapter 6 Floating Point

Outline Introduction Floating Point Arithmetic Adder Multiplier.

Multiprocessor & Multicomputer

Fast Fourier Transformation (FFT)

Aim: How do we do the operations of radical expressions?

Multiply & Divide with Scientific Notation

Multiplying and Dividing in Scientific Notation

Numerical Analysis Lecture 2.

10-4 Common Logarithms Objective:

Solving Linear Equations

Algebraic Equations Many relationships in chemistry can be expressed by simple algebraic equations. SOLVING an equation means rearranging The unknown quantity.

CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.

Presentation transcript:

A Fast Hardware Approach for Approximate, Efficient Logarithm and Anti-logarithm Computation Suganth Paul Nikhil Jayakumar Sunil P. Khatri Department of Electrical and Computer Engineering Texas A&M University, College Station

Introduction The fast generation of functions such as logarithm and antilogarithm is important in areas such as DSP, computer graphics, scientific computing, artificial neural networks, logarithmic number systems. Over the past, authors have proposed various hardware approaches to accurately approximate logarithm and antilogarithm functions. Out of these approaches, Look up table (LUT) based methods such as Brubaker, Maenner, Kmetz, SBTM are widely used. Some hardware approaches also include LUTs combined with polynomial approximations. But these need multiplications/divisions. Our approach combines an LUT with linear interpolation implemented in an area and delay efficient manner. The novelty of our approach lies in the fact that we do not need a multiplier or divider to perform interpolation. Also we use the same hardware structure to implement log and antilog. The number format used for the computation is shown below. Here : 0 < < 1 is the Mantissa and : is the exponent.

Mitchell Approximation The logarithm of a number is found as Mitchell’s approximation is given by where The error due to this approximation is The error is plotted on the right

Kmetz Approximation In the Kmetz method, the Mitchell error curve shown above is sampled at points and stored in an LUT. Here the LUT is indexed by the first bits of the mantissa If the error value looked up from the LUT is, the logarithm is found as where The error in this case due to approximating the logarithm of the mantissa portion is given by

Our Approach In our method we interpolate between values stored in the LUT to get a more accurate result. The logarithm of the mantissa part of the number is obtained as where is the error value from the LUT at location is the number of leading bits in the mantissa indexing the table is the next value in the LUT at location is the total number of bits used to represent the mantissa is the decimal value of the last bits of the mantissa The multiplication step is found as is found by using the same LUT as above We consider the following approximations to find and

Errors for Various Interpolation Methods and Table Sizes 1. is found by a) Mitchell approximation b) Kmetz approximation using another LUT 2. is found by a) Mitchell approximation b) Kmetz approximation using another LUT We find from the table below that 1.b) 2.b) has the best error performance and hence we use LUTs to approximate the multiplication. Max Error is in

Block Diagram of the Log Engine The block diagram shows the implementation of where is the 23 bit mantissa The number of leading bits of the mantissa going to the interpolator depends on the size of the LUTs used in the Interpolator. In this case we are using an LUT that holds 64 values and 13 bits of the mantissa are required. The Interpolator block is shown below.

Interpolator Block Diagram The implementation can be pipelined to get a better throughput. The COMPARE block determines if the final stage does an Add or Subtract. The LOD (leading one detector) block finds the position of the leading one and the rest of the bits are used to access the LUT. The LUT used to find and is the same and is implemented as a dual port ROM.

Antilog Computation Let The antilogarithm of this number is found as Using Mitchell’s method we make the following approximation A Kmetz approximation can be made by storing the error due to this approximation in an LUT and adding the error value to the above equation for the antilogarithm. In our approach, we compute the antilogarithm by interpolating efficiently between two adjacent table values stored in the LUT without needing a multiplier. We follow the same flow used for computing the logarithm. The error incurred while using different table sizes for computing the antilogarithm is shown below.

Comparison of FPGA Resources used by the Log Engine We implemented our method and the Symmetric Bipartite Table Method (SBTM) using a Virtex2P FPGA. Our method requires smaller on-chip Block Rams. Both methods occupied less than 1% of FPGA resources Both methods were able to support clock speeds of a little over 350 MHz.

Comparison of LUT Size used and Accuracy of the Log Computation

Conclusion Our approach has low memory requirement as compared with other methods to provide better accuracies. When compared to the SBTM, for every two bits of extra bits of accuracy, – we need a factor of 2 increase in the LUT size –the SBTM needs a factor of 3 increase in the LUT size Hence our method scales well for higher accuracy in bits. We are area efficient compared polynomial interpolation methods as we do not need a multiplier or divider to perform interpolation. The implementation can be pipelined and the number of stages in the pipeline can be varied depending on the throughput required. We have presented an approach to efficiently compute the logarithm and antilogarithm of a number in hardware.