Squaring Function Squaring Function Zehavit Trachtenberg Ido Dinerman Barak Cohen.

Slides:



Advertisements
Similar presentations
CMOS Logic Circuits.
Advertisements

Exclusive-OR and Exclusive-NOR Gates
Digital to Analog Converter By Rushabh Mehta Manthan Sheth.
Analog-to-Digital Converter (ADC) And
Chapter 6 –Selected Design Topics Part 2 – Propagation Delay and Timing Logic and Computer Design Fundamentals.
1 A New Successive Approximation Architecture for Low-Power Low-Cost A/D Converter IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL.38, NO.1, JANUARY 2003 Chi-sheng.
A Look at Chapter 4: Circuit Characterization and Performance Estimation Knowing the source of delays in CMOS gates and being able to estimate them efficiently.
Square Root Function- The Restoring Algorithm VLSI–Lab project Moran Amir Elior.
Parallel Adder Recap To add two n-bit numbers together, n full-adders should be cascaded. Each full-adder represents a column in the long addition. The.
Copyright 2001, Agrawal & BushnellDay-1 PM Lecture 4a1 Design for Testability Theory and Practice Lecture 4a: Simulation n What is simulation? n Design.
CSE-221 Digital Logic Design (DLD)
Arithmetic II CPSC 321 E. J. Kim. Today’s Menu Arithmetic-Logic Units Logic Design Revisited Faster Addition Multiplication (if time permits)
Chapter 6 Arithmetic. Addition Carry in Carry out
Lecture #24 Gates to circuits
Arithmetic-Logic Units CPSC 321 Computer Architecture Andreas Klappenecker.
CMOS VLSI Design4: DC and Transient ResponseSlide 1 EE466: VLSI Design Lecture 05: DC and transient response – CMOS Inverters.
8-Bit Gray Code Converter
EE4800 CMOS Digital IC Design & Analysis
Viterbi Decoder: Presentation #9 M1 Overall Project Objective: Design a high speed Viterbi Decoder Stage 9: 29 nd Mar Chip Level Simulation Design.
Logic Gates Combinational Circuits
Computer Data Acquisition and Signal Conversion Chuck Kammin ABE 425 March 27, 2006.
Electronic Devices Ninth Edition Floyd Chapter 13.
INTEGRATED CIRCUIT LOGIC FAMILY
Chapter 7 Complementary MOS (CMOS) Logic Design
1 Delay Estimation Most digital designs have multiple data paths some of which are not critical. The critical path is defined as the path the offers the.
1 CHAPTER 4: PART I ARITHMETIC FOR COMPUTERS. 2 The MIPS ALU We’ll be working with the MIPS instruction set architecture –similar to other architectures.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 12 – Design Procedure.
Lecture 17: Digital Design Today’s topic –Intro to Boolean functions Reminders –HW 4 due Wednesday 10/8/2014 (extended) –HW 5 due Wednesday 10/15/2014.
AICCSA’06 Sharja 1 A CAD Tool for Scalable Floating Point Adder Design and Generation Using C++/VHDL By Asim J. Al-Khalili.
Digital Logic Structures. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 3-2 Roadmap Problems Algorithms.
Differential Amplifiers.  What is a Differential Amplifier ? Some Definitions and Symbols  Differential-mode input voltage, v ID, is the voltage difference.
1 The Chinese University of Hong Kong Faculty of Education Diploma in Education (Part-Time) Winter 1997 Educational Communications and Technology Assignment.
Chapter 6-1 ALU, Adder and Subtractor
Arithmetic Building Blocks
Chapter 07 Electronic Analysis of CMOS Logic Gates
ECE Advanced Digital Systems Design Lecture 12 – Timing Analysis Capt Michael Tanner Room 2F46A HQ U.S. Air Force Academy I n t e g r i.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
1 Sequential Logic Lecture #7. 모바일컴퓨팅특강 2 강의순서 Latch FlipFlop Shift Register Counter.
1 Inverter Layout. 2 TX Gate: Layout VDD VSS VO Vi C CCC For data path structure P+ N+
VLSI Design Lecture 5: Logic Gates Mohammad Arjomand CE Department Sharif Univ. of Tech. Adapted with modifications from Wayne Wolf’s lecture notes.
1 Lecture 6 BOOLEAN ALGEBRA and GATES Building a 32 bit processor PH 3: B.1-B.5.
High Speed Analog to Digital Converter
EE2174: Digital Logic and Lab Professor Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University CHAPTER 8 Arithmetic.
Computer Architecture Lecture 3 Combinational Circuits Ralph Grishman September 2015 NYU.
Combinational Circuits
Project submitted By RAMANA K VINJAMURI VLSI DESIGN ECE 8460 Spring 2003.
Introduction to MicroElectronics
11. 9/15 2 Figure A 2 M+N -bit memory chip organized as an array of 2 M rows  2 N columns. Memory SRAM organization organized as an array of 2.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 6 – Selected Design Topics Part 1 – The.
EE210 Digital Electronics Class Lecture 10 April 08, 2009
Analog to Digital Converters
Analog/Digital Conversion
Solid-State Devices & Circuits
Chapter 6 Copyright © 2004 The McGraw-Hill Companies, Inc. All rights reserved. High-Speed CMOS Logic Design.
Arithmetic-Logic Units. Logic Gates AND gate OR gate NOT gate.
Chapter 3 Digital Logic Structures
Addition and multiplication Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
CS151 Introduction to Digital Design Chapter 5: Sequential Circuits 5-1 : Sequential Circuit Definition 5-2: Latches 1Created by: Ms.Amany AlSaleh.
IAY 0600 Digital Systems Design Timing and Post-Synthesis Verifications Hazards in Combinational Circuits Alexander Sudnitson Tallinn University of Technology.
Lecture 9: Combination of errors
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
Circuit Delay Performance Estimation Most digital designs have multiple signal paths and the slowest one of these paths is called the critical path Timing.
1’S COMPLEMENT REPRESENTATION 1’s complement of a number (binary) is obtained by changing all 1’s to 0 and all 0’s to 1. If one of these numbers is positive.
Overview Part 1 - Storage Elements and Sequential Circuit Analysis
Subject Name: Fundamentals Of CMOS VLSI Subject Code: 10EC56
Electronic Devices Ninth Edition Floyd Chapter 13.
Each I/O pin may be configured as either input or output.
PROPAGATION DELAY.
Overview Part 1 – The Design Space
Lesson 8: Analog Signal Conversion
Presentation transcript:

Squaring Function Squaring Function Zehavit Trachtenberg Ido Dinerman Barak Cohen

Squaring Function The squaring function is used in many applications such as the Viterbi alg. (error correction code), VQ alg. (image data compression, speech and writing recognition) and calculating Euclidean squared distance estimation. Fast implementation for RT purposes is needed, two of which will be explored in this work: 1. Digital implementation – compensating algorithm by Ming-Hwa Sheu and Su-Hon Lin. 2. Analog implementation.

Project Goals  Function implementation – Analog (Spice)  Function implementation – Digital (VHDL)  Implementation of function for practical use (Pythagoras Theorem).

DIGITAL

Digital Implementation  The digital implementation is based on the approximate squaring function.  Input: n-bit binary data A = Σ 2 i *a i i=0..n-1 Output: 2n-bit binary number Output: 2n-bit binary number R = Σ 2 m *r m n ≈ A 2 m=0..2n-1

Algorithm  The output expression of the exact squaring function is: (for a 4 bit number) A² = (a 3 a 2 a 1 a 0 )² = 2 6 (a 3 +a 3 a 2 ) a 3 a (a 2 +a 3 a 0 +a 2 a 1 )+2 3 a 2 a 0 + = 2 6 (a 3 +a 3 a 2 ) a 3 a (a 2 +a 3 a 0 +a 2 a 1 )+2 3 a 2 a (a 1 a 0 +a 1 )+2 0 a (a 1 a 0 +a 1 )+2 0 a 0

Algorithm cont. Step 1: The approximate result R is equal to the pure terms R = 2 6 a a a a 0 Step 2: Select the closest composite terms for compensation (2 6 a 3 a 2, 2 4 a 2 a 1, 2 2 a 1 a 0 ): R=2 6 (a 3 +a 3 a 2 )+2 4 (a 2 +a 2 a 1 )+ 2 2 (a 1 +a 1 a 0 )+2 0 a 0 =2 7 a 3 a a 3 a a 2 a a 2 a a 1 a 0 =2 7 a 3 a a 3 a a 2 a a 2 a a 1 a a 1 a a 0

Algorithm cont. Step 3: choose the second closest composite terms for compensation. do the same as step 2 (terms 2 5 a 3 a 1, 2 3 a 2 a 0 ) The result is: R= 2 7 a 3 a a 3 (a 2 Ua 1 ) +2 5 (a 3 +a 2 )a a 2 (a 1 Ua 0 ) +2 3 (a 2 +a 1 )a a 1 a a 0

Algorithm cont. Step 4: the approximation: Add the remaining term (2 4 a 3 a 0 ) to the sum with the OR operator: R=2 7 a 3 a a 3 (a 2 Ua 1 ) +2 5 (a 3 +a 2 )a a 2 (a 1 Ua 0 )Ua 3 a (a 2 +a 1 )a a 1 a a 0

Algorithm result r 4 7 = a 3 a 2 r 4 6 = a 3 a 2 Ua 3 a 1 r 4 5 =a 3 a 2 a 1 Ua 3 a 2 a 1 r 4 4 =a 2 a 1 Ua 2 a 0 Ua 3 a 0 r 4 3 = a 2 a 1 a 0 Ua 2 a 1 a 0 r 4 2 = a 1 a 0 r 4 1 = 0 r 4 0 = a 0

Algorithm cont. By induction: r n i-1 = a n-1 a n-2 r n i-2 = a n-1 a n-2 Ua n-1 a n-3 r n i-3 = a n-1 (r n-1 i-3 ) U a n-1 a n-2 a n-3 r n i-4 = (r n-1 i-4 )Ua n-1 a n-4 r n i-n = (r n-1 i-n )Ua n-1 a 0 r n i-n-1 = (r n-1 i-n-1 ) r n 0 = (r n-1 0 )

Algorithm error  Error = (A² - R) /A²*100%  the error increases with the length of the number. i.e. for 4 bits the error is 9.47% and for 10 bits the error is 18.19% Average error : 4 bits 1.04% and 10 bits 4.21%

Implementation  VHDL simulation of the function.  implementation in the transistor level using CMOS transistors.  Place and route for the circuit.  size, power and speed analysis.

SPICE Implementation

SPICE Simulation

Simulation Results OUTPUTOUTPUT INPUTINPUT LSB MSB

Propagation Delay  Propagation Delay : Nand2: Tpd = 5.1 ns Nand2: Tpd = 5.1 ns Nand3: Tpd = 6.2ns Nand3: Tpd = 6.2ns Nor2: Tpd = 4.4ns Nor2: Tpd = 4.4ns Buffer: Tpd = 6.6 ns Buffer: Tpd = 6.6 ns  The propagation delay for the critical path (the one for R2 or R7) is a nor2 gate and a buffer, thus Tpd is 11ns

Layout (Logic)

Layout (chip) Vcc Gnd out8out7out6 out5 out4 out3 out2 in4in3 in2 in1 out1

LVS Result

VHDL Implementation  The digital model was implemented using VHDL structural architecture similar to the spice implementation.  Propagation delay times were calculated using simulations of scmos library logical gates.

VHDL simulation results

VHDL simulation results cont The errors in the algorithm occurred in a = 13 and in a = 15 Expected results : for a = 13 r = 169 simulation result = 153 for a = 15 r = 225 simulation result = 209

Error Correction Different methods for producing an error- free Digital Squaring Function: Implementing the shown algorithm without using approximation. Implementing the shown algorithm without using approximation. Correcting 2 error outputs using an Error Correction Unit. Correcting 2 error outputs using an Error Correction Unit.

Straightforward Calculation r 4 7 = a 3 a 2 +[a 3 (a 2 +a 1 )]{[(a 3 +a 2 )a 1 ][a 2 (a 1 +a 0 )a 3 a 2 ]} r 4 6 ={[a 3 (a 2 +a 1 )] + [(a 3 + a 2 )a 1 ][a 2 (a 1 +a 0 )a 3 a 0 ]} r 4 5 =[(a 3 + a 2 )a 1 ] + [a 2 (a 1 +a 0 )a 3 a 0 ] r 4 4 = a 2 (a 1 +a 0 ) + a 3 a 0 r 4 3 = a 1 a 0 + a 2 a 0 r 4 2 = a 1 a 0 r 4 1 = 0 r 4 0 = a 0

Straightforward Algorithm cont. Estimated number of transistors per each output bit: r 4 7 = 52 r 4 6 = 44 r 4 5 = 32 r 4 4 = 18 r 4 3 = 6 r 4 2 = 4 r 4 1 = 0 r 4 0 = 0 input inverters = 8 Buffers ~36 Total # transistors ~200 Important : the calculated number does not contain transistors in buffers.

Straightforward Algorithm cont. Calculating the Propagation Delay: # of levels in longest path (R7) = 5 Pd for a NAND2 gate = 5.1ns Pd for input inverter = 3ns Total Pd (worst case) = 28.5ns

Error Correction Unit Designing an error correction unit for squaring a 4-bit number. The following implementation deals with 2 errors: Err1: 13 2 = 153…  13 2 = 169 Err2: 15 2 = 209…  15 2 = 225

Error Correction Unit INPUT: approximated result OUTPUT: correct result

Simulation Configuration

Error Correction Output OUTPUTOUTPUT INPUTINPUT LSB MSB LSB MSB correct output!

Error Correction –Pros & Cons Pros: It’s correct! It’s correct!Cons: Area usage Area usage Resources & Cost – 120 transistors in correction unit. Resources & Cost – 120 transistors in correction unit. Propagation – requires synchronization in order to avoid hazards (at Squaring Function output), considerable increase in propagation delay Propagation – requires synchronization in order to avoid hazards (at Squaring Function output), considerable increase in propagation delay Not a generic solution Not a generic solution

Comparing Error Correction Methods Compensatedimplementation Error Correction Unit Straightforward Implementation # of transistors ~200 Pd (worst case) 13.2 ns Extra synch. Unit required 28.5 ns Power 3.02* *f 3.02* *f+ 3.98* *f ~6* *f Area 1.72*10 -8 m *10 -8 m *10 -8 m 2 ~2.45*10 -8 m 2

ANALOG

Analog implementation  Consider the following arrangement of CMOS transistors. M1, M2 in saturation.  The equation of transistors in saturation is: I d =K(V gs -V t )² in our circuit: I 1 =K(V a -V t )² I 2 =K(V b -V t )² V b = V 2 – V a

Analog implementation cntd. combining the three equations we will receive the following: difference of output currents: I 1 – I 2 = K(V 2 -2V t )(V a -V b ) sum of output currents: I 1 + I 2 = ½K(V 2 -2V t )²+(I 1 – I 2 ) ²/2K(V 2 -2V t )²

Analog implementation cntd. In order to provide a stable V 2 voltage source we will use a current controlled circuit. In order to provide a stable V 2 voltage source we will use a current controlled circuit. I 0 = 1 / 4 K(V 2 -2V t ) ² I 0 = 1 / 4 K(V 2 -2V t ) ²

Analog implementation cntd. By connecting the drain and the source of M1 we get our new circuit. Our previous equations still hold. We consider I in as an input. we get: By connecting the drain and the source of M1 we get our new circuit. Our previous equations still hold. We consider I in as an input. we get: I in = I 1 -I 2

Analog implementation cntd. We copy I1 using a current mirror, hence we get: We copy I1 using a current mirror, hence we get: I out = I1+I2 I out = I1+I2 Now, we substitute I in and I out in our previous result: I 1 + I 2 = ½K(V 2 -2V t )²+ (I 1 – I 2 ) ²/2K(V 2 -2V t )² We get: I in = ½K(V 2 -2V t )²+ (I out ) ²/2K(V 2 -2V t )²

Analog implementation cntd. Remember that V 2 is controlled by the control current I 0 = 1 / 4 K(V 2 -2V t ) ² substituting this in the previous expression we finally get: I out = 2I 0 + I in 2 / 8I 0 We can eliminate the offset current 2I 0 by subtracting it from the output. We do so by copying I 0 twice and subtracting it from I 0. We finally get: I out = I in 2 / 8I 0 I out = I in 2 / 8I 0 In order to keep all the devices in the circuit in ON state we have to maintain the following: |I in | < 4I 0 |I in | < 4I 0

The final squaring circuit:

Simulation results Dc analysis: sweep of input current -4*I0  4*I0 Control current of 175uA gives the best results: expected output

Max absolute Error of 8uA Approximated Error in percentage: ~0.75%

Simulation results Square of sin function: expected output

BW of 10MHz 10MHz 10GHZ 1GHz 100MHz For each frequency the input is sin(wt). The expected output – sin 2 (wt) is presented as well as the output of the circuit.

Analog summary  Area: 8 transistors (Very small)  Band Width: 10MHz  Input Current Range : -700uA 700uA -700uA  700uA Absolute Error: 8uA (accuracy error more effective than enviromental errors – noise) Error in percentage: ~0.75% hence the device can handle a range of 2*(700/8)=180 values.  Constant power dissipation (can be reduced when the device is not in use by adding more hardware)

Analog Square Root  The equation of transistors in saturation is:  By solving for V gs we get:

Analog Square Root output expected

C = sqrt(A 2 +B 2 ) We combine all the results so far in order to implement Pythagoras Theorem and find an Euclidian distance:

Results expected output Sqrt(A 2 +B 2 ) A 2 +B 2 Input

Pythagoras 2 nd try

Results expected output Sqrt(A 2 +B 2 ) A 2 +B 2 Input

Bibliography (1) Fast Compensative Design Approach for the Approximate Squaring Function - Ming-Hwa Sheu and Su-Hon Lin, IEEE Journal of Solid-State Circuits, Vol.37, No.1, Jan 2002 (2) “A Class of Analog CMOS Circuits Based on the Square-Law Characteristic of an MOS Transistor in Saturation” by Klass Blut and Hans Wallinga, IEEE Journal of Solid-State Circuits, Vol.SC-22, No.3, June 1987

THE END