Bar Ilan University, Engineering Faculty

Slides:



Advertisements
Similar presentations
1 ECE 4436ECE 5367 Computer Arithmetic I-II. 2 ECE 4436ECE 5367 Addition concepts 1 bit adder –2 inputs for the operands. –Third input – carry in from.
Advertisements

Logical Design.
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
EE141 Adder Circuits S. Sundar Kumar Iyer.
Mohamed Younis CMCS 411, Computer Architecture 1 CMCS Computer Architecture Lecture 7 Arithmetic Logic Unit February 19,
Parallel Adder Recap To add two n-bit numbers together, n full-adders should be cascaded. Each full-adder represents a column in the long addition. The.
Design and Implementation of VLSI Systems (EN1600) Lecture 27: Datapath Subsystems 3/4 Prof. Sherief Reda Division of Engineering, Brown University Spring.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 28: Datapath Subsystems 2/3 Prof. Sherief Reda Division of Engineering,
Arithmetic II CPSC 321 E. J. Kim. Today’s Menu Arithmetic-Logic Units Logic Design Revisited Faster Addition Multiplication (if time permits)
VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California
Computer Structure - The ALU Goal: Build an ALU  The Arithmetic Logic Unit or ALU is the device that performs arithmetic and logical operations in the.
1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.
ECE C03 Lecture 61 Lecture 6 Arithmetic Logic Circuits Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Arithmetic II CPSC 321 Andreas Klappenecker. Any Questions?
UNIVERSITY OF MASSACHUSETTS Dept
Introduction to CMOS VLSI Design Lecture 11: Adders
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Shifters. n Adders and ALUs.
Lecture 8 Arithmetic Logic Circuits
Arithmetic-Logic Units CPSC 321 Computer Architecture Andreas Klappenecker.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Fall 2008EE VLSI Design I - © Kia Bazargan 1 EE 5323 – VLSI Design I Kia Bazargan University of Minnesota Adders.
Spring 2006EE VLSI Design II - © Kia Bazargan 68 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part II: Adders.
Lecture 17: Adders.
Lecture 12b: Adders. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 17: Adders 2 Generate / Propagate  Equations often factored into G and P  Generate and.
Spring 2002EECS150 - Lec10-cl1 Page 1 EECS150 - Digital Design Lecture 10 - Combinational Logic Circuits Part 1 Feburary 26, 2002 John Wawrzynek.
Copyright 2008 Koren ECE666/Koren Part.5a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
Adders. Full-Adder The Binary Adder Express Sum and Carry as a function of P, G, D Define 3 new variable which ONLY depend on A, B Generate (G) = AB.
Lec 17 : ADDERS ece407/507.
Parallel Prefix Adders A Case Study
Introduction to CMOS VLSI Design Lecture 11: Adders David Harris Harvey Mudd College Spring 2004.
VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California
1. Copyright  2005 by Oxford University Press, Inc. Computer Architecture Parhami2 Figure 10.1 Truth table and schematic diagram for a binary half-adder.
Comparison of Two RCA Implementations Abstract Two implementations of RCA (Ripple Carry Adder) static circuit are introduced—CMOS and TG logic circuit.
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
ECE2030 Introduction to Computer Engineering Lecture 12: Building Blocks for Combinational Logic (3) Adders/Subtractors, Parity Checkers Prof. Hsien-Hsin.
Chapter 6-1 ALU, Adder and Subtractor
Arithmetic Building Blocks
55:035 Computer Architecture and Organization Lecture 5.
Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.
Chapter 14 Arithmetic Circuits (I): Adder Designs Rev /12/2003
Nov 10, 2008ECE 561 Lecture 151 Adders. Nov 10, 2008ECE 561 Lecture 152 Adders Basic Ripple Adders Faster Adders Sequential Adders.
July 2005Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics n Shifters. n Adders and ALUs.
FPGA-Based System Design: Chapter 4 Copyright  2003 Prentice Hall PTR Topics n Number representation. n Shifters. n Adders and ALUs.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
1 CS 151: Digital Design Chapter 4: Arithmetic Functions and Circuits 4-1,2: Iterative Combinational Circuits and Binary Adders.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
COMP541 Arithmetic Circuits
ECE 645 – Computer Arithmetic Lecture 6: Multi-Operand Addition ECE 645—Computer Arithmetic 3/5/08.
Topics covered: Arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Unrolling Carry Recurrence
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
EE466: VLSI Design Lecture 13: Adders
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 19: Adder Design
CSE477 VLSI Digital Circuits Fall 2002 Lecture 20: Adder Design
Arithmetic-Logic Units. Logic Gates AND gate OR gate NOT gate.
1 Digital Design Debdeep Mukhopadhyay Associate Professor Dept of Computer Science and Engineering NYU Shanghai and IIT Kharagpur.
Addition and multiplication Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
1. 2 Figure 10.1 Truth table and schematic diagram for a binary half-adder Simple Adders Half-adder.
Computer Organization and Design Arithmetic & Logic Circuits Montek Singh Nov 2, 2015 Lecture 11.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
CS151 Introduction to Digital Design Chapter 4: Arithmetic Functions and HDLs 4-1: Iterative Combinational Circuits 4-2: Binary Adders 1Created by: Ms.Amany.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
Full Adder Truth Table Conjugate Symmetry A B C CARRY SUM
ECE 352 Digital System Fundamentals
Presentation transcript:

Bar Ilan University, Engineering Faculty Addition Circuits Shmuel Wimer Bar Ilan University, Engineering Faculty Technion, EE Faculty Nov 2012

Full Adders Nov 2012

Design I: Mirror CMOS logic odd 1s odd 0s kill generate 1-propagate 0-propagate N and P networks are identical rather than complementary! Nov 2012

Design II: S is factored to reuse Cout kill generate 1-propagate 0-propagate odd 1s odd 0s Uses only 28 transistors. Can be reduced to 24 transistors. S has larger delay but it is not on the critical path Nov 2012

Only the transistors of the carry are optimized for speed. (why?) The transistors connected to Cin are closest to the output of the carry (and sum) circuits. (why?) Only the transistors of the carry are optimized for speed. (why?) Nov 2012

Ripple-Carry Addition Carry computation is the critical path Carry propagation delay is reduced by using inverting adders where every other stage is working on complementary data. Nov 2012

Straight-forward, 16 transistors 14 transistors XOR / XNOR Circuits Straight-forward, 16 transistors 14 transistors More efficient, less contacts, smaller layout, commonly used in STD cell Lib. Complementary CMOS, 12 transistors Nov 2012

Transmission gate design, 10 transistor Only 4 transistors, fast, but doesn’t swing rail-to-rail. 4-way Only 6 transistors, but non restoring. Nov 2012

Full-adder using XOR and MUX 24 transistors and buffered outputs. Cout and S have same delay. Nov 2012

Carry-Propagate Addition Ripple-Carry Addition Carry computation is the critical path in addition Recall: Generate and Propagate signals are a key for fast addition Nov 2012

The sum for bit i can be computed by: with the base case Define Recall: The sum for bit i can be computed by: Nov 2012

Addition is reduced into 3-step computation process bitwise propagate and generate logic group propagate and generate logic Addition acceleration is obtained by smart PG grouping sum logic Nov 2012

shared bitwise propagate-generate (PG) logic A combined pair of smaller groups is called valency-2 group PG logic. To use fewer stages for carry propagation, higher valency comprising more complex gates is possible, e.g. valency-4: Nov 2012

PG Carry-Ripple Addition Nov 2012

Adder architecture diagram Group generate PG logic actual CMOS sum XOR Nov 2012

Carry Chain Adder majority (carry) CMOS logic pass transistor kill propagate kill generate Manchester valeny-4 carry chain adder (dynamic logic) Nov 2012

How delay grows with chain length? quadratic! C3 is calculated in “one” time unit but we must wait for carry to ripple through group to be ready. How delay grows with chain length? quadratic! Chain should be broken and buffered. Common length is 3-4. Nov 2012

Manchester carry chain adder using valency-4 stages Similar to ripple carry adder but uses N/3 stages. Involves a series propagate transistor per bit. Faster than AND-OR or majority gate per bit in carry ripple. Nov 2012

Carry Skip Adder Assume that the propagate computed for a group i:j is 1. Consequently, the carry-out of group i:j is the same as the carry-in and carry computation can be skipped. bitwise propagate and generate + group propagate skip path skip MUX Nov 2012

Carry-skip adder Manchester stage (dynamic logic) For group propagate 0 the carry generated within group is taken. This is a considerable acceleration compared to carry-ripple, while hardware overhead is small. Was proposed in 19th century by Charles Babbage and used by mechanical calculators. Carry-skip adder Manchester stage (dynamic logic) skip MUX Nov 2012

4-bit carry chain if each group generates a carry Propagation delay 4-bit carry chain if each group generates a carry 4-bit ripple chain if carry-in is by passed to chain carry skip chain Nov 2012

N-bit adder with k groups of n bits each (N=kn). First chain must compute sums and carry within n-1 delay units. Carry propagates through k-2 stages. Delay of a chain is slower than skip propagation delay (AND, MUX). Last chain must compute sums within n-1 delay units. Nov 2012

Example: consider 32-bit addition. Question: Can we further accelerate carry propagation? Answer: Yes we can, block size may vary across adder. Nov 2012

Bits are distribute as follows: Consider t ripple-carry adder groups A0, A1, … , At-2, At-1. How should we distribute the N bits in those blocks? Assume a skip chain of A1, … , At-2. Since skip is far faster than ripple carry, we wish to minimize the number b of bits in A0 and At-1. Bits are distribute as follows: Summing over all blocks: Nov 2012

number of blocks is increased delay is decreased number of blocks is increased Nov 2012

Groups are of lengths [2, 3, 4, 4, 3] compared to [4, 4, 4, 4]. Saved 2 levels of logic on critical path compared to fixed. Nov 2012

Subtraction How to compute A-B? Recall that in 2’s complement We’d like to combine adder and subtracter in one circuit Nov 2012

Carry-Lookahead Adder Carry-skip adder ripples the carry through the group, requiring waiting to determine whether the first group generates a carry. Carry-lookahead (CLA) computes group generate signals as well as group propagate signals to avoid waiting for a ripple. Valency-4 PG Nov 2012

Carry-lookahead circuit with half devices compared to AOAO… Cout is complementary P and G signals connect and disconnect path to Vdd / Vss What happens when all P=1 and G=0? Both paths to Vdd and Vss are closed. Cin then takes care Nov 2012

N-bit adder with k groups of n bits each (N=kn). AND-OR group PG N-bit adder with k groups of n bits each (N=kn). Nov 2012

Propagation delay No better than variable-length carry skip, but requires more HW due to PG generation per group. Nov 2012

Commercial MSI 4-bit CLA adder bit PG 2-level logic CLA sum Nov 2012

Carry-Select Adder The critical paths in carry-skip and carry-lookahead involves carry calculation into each n-bit group and then using it for the sums within the group. It is possible to pre compute the outputs for both 0/1 carry inputs and then select accordingly. If C4=0, top adder applies for C8. If C4=1 bottom adder applies for C8. Notice that cout(cin=1) >= cout(cin=0). Nov 2012

1st group computes carry out Select accordingly Compute for both carries n-bits of first group adder Propagation delay Simultaneous PG for all bits Nov 2012

Carry-Increment Adder Carry-select adder is fast but the amount of circuits is about twice compared to others. This is both power and area penalty. The PG and XOR circuits are similar in 0 and 1 adders, independent , hence MUX can be used to select the proper input of XOR. This is called carry-increment adder. Nov 2012

Acceleration is possible by variable group size Nov 2012

Tree Adders In wide adders the delay of the carry passing through stages becomes dominant. The delay can be reduced by looking ahead across lookahead blocks. The square root delay can be improved to logarithmic delay by constructing multilevel lookahead structures. There are many ways to build lookahead trees, offering tradeoffs between number of circuits, fan-out and amount of interconnects. Those are translated into area and power. Such adders are known as lookahead adders, logarithmic adders or parallel-prefix adders. Nov 2012

Compute prefixes for 2-bit groups. Then prefixes for 4-bit groups. Brent-Kung tree Compute prefixes for 2-bit groups. Then prefixes for 4-bit groups. Then 8-bit and 16-bit groups. Prefixes fan back down to compute carry-in to each bit. 2(log2N) - 1 levels (area), fan-out 2. Nov 2012

Sklansky tree Intermediate prefixes can be computed along with those of large groups. Delay reduced to log2N. Fan out is doubled at each row. Transistor sizing and buffering is required (area, power). Nov 2012

Achieves log2N stages. Fan out is 2. Kogge-Stone tree Achieves log2N stages. Fan out is 2. Wire length grows is quadratic with N. It significantly increases area, buffers, power. Nov 2012

Use Kogge-Stone on odd bits. Han-Carlson tree Use Kogge-Stone on odd bits. Use one more stage to ripple into even bits. Nov 2012

Comparison of Adder Architectures Logic Levels Max Fan-out # Wiring Tracks # Cells Ripple-Carry 1 Carry-Skip (n=4) 2 Carry-Increment (n=4) 4 Carry-Increment (var.) Brent-Kung Sklansky Kogge-Stone Han-Carlson PG and XOR logic is not counted. Ripple-carry should be used when they meet timing constraints (small area and power). For 64 bits and up tree adders are distinctly faster. Nov 2012

Logic synthesizers automatically map the “+” operator into appropriate adder to meet timing constraints while minimizing area and power (aka design ware). Nov 2012

Carry Probabilities Nov 2012

Nov 2012

The short length of average carry propagation indicates that the average worst-case may also be short. A usual design of a k-bit adder is targeting the worst- case where the carry is propagating along the entire bits, regardless of adder architecture. Burks, Goldstine and von Neumann [1946] noticed that the average worst-case carry propagation length is log2k. Nov 2012

Nov 2012

Nov 2012

Nov 2012

Carry-Completion Detection Worst-case carry propagation of length k almost never materializes. A carry-completion detection adder performs addition in average O(log2k) time. A carry 0 is also explicitly represented and allowed to propagate between stages. The carry into stage i is represented by the two-rail code: Nov 2012

Nov 2012

Two 1s generate a carry of 1 propagating towards MSB. Initially, all carries are (0,0), namely, unknown. When every carry assumes one of the values (0,1) or (1,0) carry propagation is complete. Nov 2012

Behrooz Parhami, Computer Arithmetic, Oxford, 2010, page 100: Excluding initialization and carry-completion detection times, the latency of k-bit carry-completion adder ranges from 1 to 2k+1 gate delays, with 2log2k+1 average gate delays. Behrooz Parhami, Computer Arithmetic, Oxford, 2010, page 100: "Because the latency of the carry-completion adder is data-dependent, the design of Fig. 5.9 is suitable for use in asynchronous systems. Most modern computers, however, use synchronous logic and thus cannot take advantage of the high average speed of a carry-completion adder." Nov 2012