1/30 Division by Convergence 授課老師:王立洋老師 製作學生: M9535204 蔡鐘葳.

Slides:



Advertisements
Similar presentations
Rules of Matrix Arithmetic
Advertisements

Interpolation A standard idea in interpolation now is to find a polynomial pn(x) of degree n (or less) that assumes the given values; thus (1) We call.
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
Lecture 19: Parallel Algorithms
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Chapter 4 Systems of Linear Equations; Matrices Section 2 Systems of Linear Equations and Augmented Matrics.
 The amount of time it takes a computer to solve a particular problem depends on:  The hardware capabilities of the computer  The efficiency of the.
CENG536 Computer Engineering Department Çankaya University.
CSE Differentiation Roger Crawfis. May 19, 2015OSU/CIS 5412 Numerical Differentiation The mathematical definition: Can also be thought of as the.
LIAL HORNSBY SCHNEIDER
Algebra Problems… Solutions Algebra Problems… Solutions © 2007 Herbert I. Gross Set 22 By Herbert I. Gross and Richard A. Medeiros next.
1.2 Row Reduction and Echelon Forms
Linear Equations in Linear Algebra
UNIVERSITY OF MASSACHUSETTS Dept
Copyright 2008 Koren ECE666/Koren Part.9b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
May 2007Computer Arithmetic, DivisionSlide 1 Part IV Division.
CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 8: Division.
Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Computer Arithmetic Integers: signed / unsigned (can overflow) Fixed point (can overflow) Floating point (can overflow, underflow) (Boolean / Character)
Ch. 21. Square-rootingSlide 1 VI Function Evaluation Topics in This Part Chapter 21 Square-Rooting Methods Chapter 22 The CORDIC Algorithms Chapter 23.
LIAL HORNSBY SCHNEIDER
ECE 645 – Computer Arithmetic Lecture 10: Fast Dividers ECE 645—Computer Arithmetic 4/15/08.
1.2 Gaussian Elimination.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Introduction to Statistical Inferences
Solving Non-Linear Equations (Root Finding)
Coping With the Carry Problem 1. Limit Carry to Small Number of Bits Hybrid Redundant Residue Number Systems 2.Detect the End of Propagation Rather Than.
1 Lecture 5 Floating Point Numbers ITEC 1000 “Introduction to Information Technology”
Lecture Notes Dr. Rakhmad Arief Siregar Universiti Malaysia Perlis
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
Multi-operand Addition
ECE 8053 Introduction to Computer Arithmetic (Website: Course & Text Content: Part 1: Number Representation.
Copyright © 2013, 2009, 2005 Pearson Education, Inc. 1 3 Polynomial and Rational Functions Copyright © 2013, 2009, 2005 Pearson Education, Inc.
June 2007 Computer Arithmetic, Function EvaluationSlide 1 VI Function Evaluation Topics in This Part Chapter 21 Square-Rooting Methods Chapter 22 The CORDIC.
Operations with Fractions. Adding and Subtracting Fractions.
Common Fractions © Math As A Second Language All Rights Reserved next #6 Taking the Fear out of Math Dividing 1 3 ÷ 1 3.
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
RATIONAL EXPRESSIONS. Rational Expressions and Functions: Multiplying and Dividing Objectives –Simplifying Rational Expressions and Functions –Rational.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel
Chinese Remainder Theorem Dec 29 Picture from ………………………
Advanced Dividers Lecture 10. Required Reading Chapter 13, Basic Division Schemes 13.4, Non-Restoring and Signed Division Chapter 15 Variation in Dividers.
Parallel and Distributed Simulation Time Parallel Simulation.
Lecture 11 Advanced Dividers.
Copyright 2008 Koren ECE666/Koren Part.7b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
ECE DIGITAL LOGIC LECTURE 15: COMBINATIONAL CIRCUITS Assistant Prof. Fareena Saqib Florida Institute of Technology Fall 2015, 10/20/2015.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
1/39 Motion Adaptive Search for Fast Motion Estimation 授課老師:王立洋老師 製作學生: M 蔡鐘葳.
1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 667 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 2117 Lecture.
1 1.2 Linear Equations in Linear Algebra Row Reduction and Echelon Forms © 2016 Pearson Education, Ltd.
Copyright © 2013, 2009, 2005 Pearson Education, Inc. 1 3 Polynomial and Rational Functions Copyright © 2013, 2009, 2005 Pearson Education, Inc.
Choosing RNS Moduli Assume we wish to represent 100, Values Standard Binary  lg 2 (100,000) 10  =   =17 bits RNS(13|11|7|5|3|2), Dynamic.
UNIVERSITY OF MASSACHUSETTS Dept
CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
UNIVERSITY OF MASSACHUSETTS Dept
Algorithms with numbers (1) CISC4080, Computer Algorithms
CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
CSE Differentiation Roger Crawfis.
Linear Equations in Linear Algebra
Lecture 2- Query Processing (continued)
CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
Totally Asynchronous Iterative Algorithms
Linear Equations in Linear Algebra
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Pivoting, Perturbation Analysis, Scaling and Equilibration
Presentation transcript:

1/30 Division by Convergence 授課老師:王立洋老師 製作學生: M 蔡鐘葳

2/30 Outline ▓ Speedup of Convergence Division ▓ Hardware Implementation ▓ Analysis of Lookup Table Size ▓ Reference

3/ Speedup of Convergence Division

4/30 Introduction Compute y = 1/d Do the multiplication yz Division can be performed via 2  log2 k  – 1 multiplications This is not yet very impressive 64-bit numbers, 5-ns multiplier  55-ns division

5/30 Three Types of Speedup Three types of speedup are possible: Reducing the number of multiplications (reduce m) Using narrower multiplications (reduce the width of some x (i) s) Performing the multiplications faster

6/30 Initial Approximation Convergence is slow in the beginning: It takes 6 multiplications to get 8 bits of convergence and another 5 to go from 8 bits to 64 bits Since x (0) x (1) x (2) is essentially an approximation to 1/d, these four initial multiplications can be replaces by a table-lookup step that directly supplies x (0+)

7/30 Initial Approximation via Table Lookup A 2 w  w lookup table is necessary and sufficient for w bits of convergence after the first pair multiplications Approx to 1/d Better approx Read this value, x (0+), directly replaced by a table-lookup step, thereby reducing 6 multiplications to 2 d x (0) x (1) x (2) = ( ) two

8/30 Example with 4-bit lookup Example with 4-bit lookup: d = ( xxxx...) two 11/16  d < 12/16 Inverses of the two extremes are 16/11  and 16/12  So, is a good estimate for 1/d  = (11/8)  (11/16) = 121/128 =  = (11/8)  (3/4) = 33/32 =

9/30 Fig Fig Convergence in division by repeated multiplications with initial table lookup. After table lookup and first pair of multiplications, replacing several iterations After the second pair of multiplications

10/30 Fig For division by repeated multiplications We saw that convergence to 1 and q occurred from below If at some point in our iterations, d (i) overshoots 1 (becomes 1 + ε) The next multiplicative factor 2 - d (i) = 1 - ε will lead to a value smaller than 1 But still closer to 1, for d (i+1)

11/30 Analysis the Truncating Multiplicative (1/2) We begin by noting that dx (0) x (1) … x (i) = 1 – y (i) x (i+1) = 2 – (1 – y (i) ) = 1 + y (i) Assume that we truncate 1 – y (i) to an a-bit fraction Thus obtaining (1 – y (i) ) T with an error of α< 2 -a

12/30 Analysis the Truncating Multiplicative (2/2) With this truncated multiplicative factor, we get x (i+1) = 2 – (1 – y (i) ) = 1 + y (i) Where 0 ≦ (x (i+1) ) T – x (i+1) < 2 -a Thus dx (0) x (1) … x (i) x (i+1) T = (1 – y (i) )(1 + y (i) + α) = 1 – (y (i) ) 2 + α(1 – y (i) ) = dx (0) x (1) … x (i) x (i+1) + α(1 – y (i) )

13/30 Fig Fig Convergence in division by repeated multiplications with initial table lookup and the use of truncated multiplicative factors.

14/30 Fig The first pair of multiplications following the table- lookup involve a narrow multiplier It may be faster than a full-width multiplications If the multiplier is suitably truncated The result is that convergence occurs from above or below

15/30 Fig Fig One step in convergence division with truncated multiplicative factors.

16/30 Fig If we aim to go from l bits to 2l bits of convergence We can truncate the next multiplicative factor to 2l Bits Consider Fig A is the result of precise iteration, is no more than 2 -2l below 1 With a = 2l, B, arrived at by the approximate iteration, will be no more than 2 -2l above 1

17/30 Example 64-bit multiplication Initial step: Table of size 256  8 = 2K bits Middle steps: Multiplication pairs, with 9, 17, and 33-bit multipliers Final step: Full 64  64 multiplication

18/ Hardware Implementation

19/30 Hardware Implementation Fig Two multiplications fully overlapped in a 2-stage pipelined multiplier.

20/30 Fig As the computation of z (i) x (i) moves from the top to the bottom pipeline stage The next iteration begins by computing the stage of d (i+1) x (i+1)

21/30 Implementing Division with Reciprocation Reciprocation: Multiplication pairs are data- dependent, so they cannot be pipelined or performed in parallel Since in the recurrence x (i+1) = x (i) (2 - x (i) d) The second multiplication by x (i) needs the result of the first one The most promising speedup method relief on deriving a better starting approximation to 1/d

22/30 The Required Lookup Table The Required Lookup Table can be made smaller, or totally eliminated, by a variety of methods Store the reciprocal values for fewer points Use linear or higher-order interpolation to compute the starting approximation Formulate the starting approximation as a multi-operand addition problem Use or pass through the multiplier’s CSA tree, suitably augmented, to compute it

23/ Analysis of Lookup Table Size

24/30 Theorem for Table Size Theorem 16.1: To get w  5 bits of convergence after the first iteration of division by repeated multiplications, w bits of d (beyond the mandatory 1) must be inspected. The factor x (0+) read out from table is of the form (1.xxx... xxx) two, with w bits after the radix point Based on the theorem, the required table size is 2 w × w The cases w < 5: Practically uninteresting (allow smaller table) We can ignore them

25/30 Analysis of Lookup Table Size (1/4) Recall that our objective is to have 1 – 2 -w ≦ dx (0+) ≦ w Let d = (0.1 d -2 d -3 ) …d -(w+1) d -(w+2) …d -l ) two w bits to be inspected Theorem 16.1 postulates the existence of x (0+) = (1. x + -1 x + -2 …x + -w ) two satisfying the objective inequality

26/30 Analysis of Lookup Table Size (2/4) Let u = (1 d -2 d -3 ) … d -(w+1) ) two satisfying 2 w ≦ u < 2 w+1 We have 2 -(w+1) u ≦ d < 2 -(w+1) (u+1) Similarly, let v = (1x + -1 x + -2 …x + -w ) two The objective inequality can be rewrite as 2 w – 1 ≦ dv ≦ 2 w + 1

27/30 Analysis of Lookup Table Size (3/4) We derive the following sufficient conditions 2 w - 1 ≦ 2 -(w+1) uv 2 -(w+1) (u+1)v ≦ 2 w + 1 The conditions lead to the following restrictions on v

28/30 Analysis of Lookup Table Size (4/4) The latter condition is equivalent to The last inequality always holds is left as an exercise Completes the “sufficiency” part of the proof At least w bits of d must be inspected x (0+) must have at least w bits after the radix point

29/30 Example Table 16.2 Sample entries in the lookup table replacing the first four multiplications in division by repeated multiplications ––––––––––––––––––––––––––––––––––––––––––––––––––––––– Address d = 0.1 xxxx xxxx x (0+) = 1. xxxx xxxx ––––––––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––––––––––––––––––––––––––––––––––––––––––––––– Example: Table entry at address 55 (311/512  d < 312/512) For 8 bits of convergence, the table entry f must satisfy (311/512)(1 +. f)  1 – 2 –8 (312/512)(1 +. f)  –8 199/311 .f  101/156 or ≤ 256 . f ≤ Two choices: 164 = ( ) two or165 = ( ) two

30/30 Reference [1] Behrooz Parhami, “Computer Arithmetic Algorithms and Hardware Designs,” Oxford University Press