Square Root Function- The Restoring Algorithm VLSI–Lab project Moran Amir Elior.

Slides:



Advertisements
Similar presentations
Multiplication and Division
Advertisements

L10 – Transistors Logic Math 1 Comp 411 – Spring /22/07 Arithmetic Circuits Didn’t I learn how to do addition in the second grade?
Datorteknik IntegerMulDiv bild 1 MIPS mul/div instructions Multiply: mult $2,$3Hi, Lo = $2 x $3;64-bit signed product Multiply unsigned: multu$2,$3Hi,
Datorteknik DigitalCircuits bild 1 Combinational circuits Changes at inputs propagate at logic speed to outputs Not clocked No internal state (memoryless)
1 CONSTRUCTING AN ARITHMETIC LOGIC UNIT CHAPTER 4: PART II.
Integer division Pencil and paper binary division (dividend)(divisor) 1000.
Squaring Function Squaring Function Zehavit Trachtenberg Ido Dinerman Barak Cohen.
361 div.1 Computer Architecture ECE 361 Lecture 7: ALU Design : Division.
Lecture 9 Sept 28 Chapter 3 Arithmetic for Computers.
IMPLEMENTATION OF µ - PROCESSOR DATA PATH
L10 – Multiplication Division 1 Comp 411 – Fall /19/2009 Binary Multipliers ×
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Chap 3.3~3.5 Construction an Arithmetic Logic Unit (ALU) Jen-Chang Liu, Spring 2006.
Contemporary Logic Design Arithmetic Circuits © R.H. Katz Lecture #24: Arithmetic Circuits -1 Arithmetic Circuits (Part II) Randy H. Katz University of.
Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)
Computer Arithmetic Integers: signed / unsigned (can overflow) Fixed point (can overflow) Floating point (can overflow, underflow) (Boolean / Character)
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 4 – Arithmetic Functions Logic and Computer.
IWBAT compare and order positive and negative numbers.
Solving a System of Equations using Multiplication
Calculator Lab Overview Note: Slides Updated 10/8/12
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
Figure 5.1 Conversion from decimal to binary. Table 5.1 Numbers in different systems.
5-1 Programmable and Steering Logic Chapter # 5: Arithmetic Circuits.
Reconfigurable Computing - Multipliers: Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on.
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
Introduction to structured VLSI Projects 4 and 5 Rakesh Gangarajaiah
Know your signs? Yes/No? I can fix that!. Maths signs.
EEL 3801C EEL 3801 Part I Computing Basics. EEL 3801C Data Representation Digital computers are binary in nature. They operate only on 0’s and 1’s. Everything.
Combinational Circuits
Divide Calculation Latency
IT253: Computer Organization
2.1 Frequency distribution Histogram, Frequency Polygon.
69 Decimal (Base 10) Numbers n Positional system - each digit position has a value n 2534 = 2*1, * *10 + 4*1 n Alternate view: Digit position.
CS/EE 3700 : Fundamentals of Digital System Design Chris J. Myers Lecture 5: Arithmetic Circuits Chapter 5 (minus 5.3.4)
 Seattle Pacific University EE Logic System DesignCounters-1 Shift Registers DQ clk DQ DQ ShiftIn Q3Q3 Q2Q2 DQ Q1Q1 Q0Q0 A shift register shifts.
Copyright 2008 Koren ECE666/Koren Part.7b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
CO5023 Introduction to Digital Circuits. What do we know so far? How to represent numbers (integers and floating point) in binary How to convert between.
CDA 3101 Spring 2016 Introduction to Computer Organization
Apr. 3, 2000Systems Architecture I1 Introduction to VHDL (CS 570) Jeremy R. Johnson Wed. Nov. 8, 2000.
May 9, 2001Systems Architecture I1 Systems Architecture I (CS ) Lab 5: Introduction to VHDL Jeremy R. Johnson May 9, 2001.
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
Lecture #23: Arithmetic Circuits-1 Arithmetic Circuits (Part I) Randy H. Katz University of California, Berkeley Fall 2005.
1 Arithmetic Where we've been: –Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's up ahead: –Implementing the Architecture.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
B0110 ALU ENGR xD52 Eric VanWyk Fall Today Back to Gates! Review Timing with Adders Compare Growth Characteristics Construct Adder/Subtractor Construct.
1 Computer Architecture & Assembly Language Spring 2009 Dr. Richard Spillman Lecture 11 – ALU Design.
William Stallings Computer Organization and Architecture 8th Edition
Combinational circuits
MIPS mul/div instructions
CSCI206 - Computer Organization & Programming
CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
Morgan Kaufmann Publishers
CDA 3101 Summer 2007 Introduction to Computer Organization
ECE 434 Advanced Digital System L13
ECE 434 Advanced Digital System L12
Arithmetic Circuits Didn’t I learn how
Arithmetic Where we've been:
Computer Organization and Design Arithmetic & Logic Circuits
Arithmetic Circuits (Part I) Randy H
ECE 448 Lecture 13 Multipliers Timing Parameters
Overview Part 1 – Design Procedure Part 2 – Combinational Logic
Reading: Study Chapter (including Booth coding)
Montek Singh Mon, Mar 28, 2011 Lecture 11
Design of Networks for Arithmetic Operation
Number Representation
Presentation transcript:

Square Root Function- The Restoring Algorithm VLSI–Lab project Moran Amir Elior

Goals and needs The squaring function performs the basic math operation f(A) = Q such that Q 2 = A. The root method is considered difficult to implement in hardware, and requires iterative process (or use of lookup table). We present a method which is accurate (not an approximation). The results are Q and R such that:

Motivation The restoring method is based on “binary” search over the result range of the input, which is half the input bits. Each time, the last remainder is sign checked. If the remainder >= 0, we search in the upper domain, else, the lower domain. Since this is a square root, we can divide the input by 4 and not by 2.

The Restoring Algorithm Initial conditions: > Let R (the remainder) equal A, the input. > Let Q equal 0. Q =q 1 … q n Iterative step (i is the index): > if R>>2i >= { Q, 0,1 } then q j-1 = ‘1’ ; R = R – {Q, 0, 1} > if R>>2i < { Q, 0,1 } then q j-1 = ‘0’ ; R = R R and Q are best thought of as changing in width, bit wise; in reality, they will be zero padded from the left. We Compare R, which is originally the input, to the main terms of the square of q (as was explained for the squaring function method): 2 6 a 3, 2 4 a 2, 2 2 a 1, 2 0 a 0 (4 bit example) If we are bigger, we add zero to the result and keep the remainder; if we are smaller or equal we add one to the result, and subtract the term from the remainder such that we are left with the minor terms.

Example – square root of 11

Implementation issues The operations needed are: > Subtraction > Shifting We can use a simple Data-path for this operators. We can use multiplied Conditional Subtraction (SC) units as well. For each of them, there are n/2+1 iterations.

Behavioral VHDL design For Data Path implementation Qj := " "; R2j := D; FOR j IN 4 DOWNTO 1 LOOP Shift8(Qj,j,'1',Q_t); Q_t(j+j-2) := '1'; Subtract(R2j,Q_t, R_t, negative); IF (negative = '0') THEN Qj(j-1) := '1'; R2j := R_t; ELSE Qj(j-1) := '0'; END IF; END LOOP;

Using a Data path ALU 0 1 QR 0 1 sign load

Using SC units again the square root of 11 example

Considerations Design reuse: ALU already exists. Simplicity: SC units are easy to implement: procedure SC ( signal CO, S : out Std_Logic ; signal R, D, CI, Q : in Std_Logic ) is begin CO <= (R and D) or (R and CI) or (D and CI) ; S <= R xor ((D xor CI) and Q) ; end SC ; Area: ~ same as ALU. Speed: ALU demands 4-5 cycles. The SC units can produce output much faster. Power: Lower than ALU ALU iteration number: q iteration SC unit count: 0.5*q *q - 1

Root function implementation

SC simple implementation

SC optimized implementation

Behavioral VHDL simulation

Behavioral VHDL simulation (Cont’)

Results on Schematics A R Q

Results on Schematics II R A Q

Simulation results -Q

Simulation results -R

The SC unit maximal delay 1.62nS SC max latency Few transients with the maximal delays

Power On 25 cycles The most power consuming cycle is marked in red. 25mW RMS

Transistor count & latency The SC unit: 34 MOS devices SC max latency ~ 2.5nSec (includes margin) The Square Root extractor: 17 SC units 17 * 34 = 578 MOS devices Circuit max latency – 15XSC Latency = 40nSec Max working frequency = 25MHz RMS power on most consuming cycle = 25mW Highest power peek measured = 1W

Performance evaluation Using ALU scheme will require minimum of 4 cycles => 400 nSec Circuit improves speed by a factor of 10. Area is not much less than the ALU unit itself excluding the peripheries we should have add.

Credits for pictures Alain Guyot’s site for TIMA LaboratoryAlain Guyot cmp.imag.fr/~guyot/Cours/Oparithm/englis h/Extrac.htm