Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 68 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part II: Adders.

Slides:



Advertisements
Similar presentations
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
Advertisements

CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
EE141 Adder Circuits S. Sundar Kumar Iyer.

ECE 331 – Digital System Design
Design and Implementation of VLSI Systems (EN1600) Lecture 27: Datapath Subsystems 3/4 Prof. Sherief Reda Division of Engineering, Brown University Spring.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 28: Datapath Subsystems 2/3 Prof. Sherief Reda Division of Engineering,
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California
Introduction to CMOS VLSI Design Lecture 11: Adders
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Shifters. n Adders and ALUs.
Lecture 8 Arithmetic Logic Circuits
Spring 2006EE VLSI Design II - © Kia Bazargan 187 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part IV: Control Path and Busses.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 23 - Subsystem.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Fall 2008EE VLSI Design I - © Kia Bazargan 1 EE 5323 – VLSI Design I Kia Bazargan University of Minnesota Adders.
Lecture 17: Adders.
ECE 301 – Digital Electronics
Spring 2002EECS150 - Lec10-cl1 Page 1 EECS150 - Digital Design Lecture 10 - Combinational Logic Circuits Part 1 Feburary 26, 2002 John Wawrzynek.
Copyright 2008 Koren ECE666/Koren Part.5a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
Adders. Full-Adder The Binary Adder Express Sum and Carry as a function of P, G, D Define 3 new variable which ONLY depend on A, B Generate (G) = AB.
Lec 17 : ADDERS ece407/507.
Parallel Prefix Adders A Case Study
Introduction to CMOS VLSI Design Lecture 11: Adders David Harris Harvey Mudd College Spring 2004.
Bar Ilan University, Engineering Faculty
VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California
1. Copyright  2005 by Oxford University Press, Inc. Computer Architecture Parhami2 Figure 10.1 Truth table and schematic diagram for a binary half-adder.
CS1Q Computer Systems Lecture 9 Simon Gay. Lecture 9CS1Q Computer Systems - Simon Gay2 Addition We want to be able to do arithmetic on computers and therefore.
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
ECE2030 Introduction to Computer Engineering Lecture 12: Building Blocks for Combinational Logic (3) Adders/Subtractors, Parity Checkers Prof. Hsien-Hsin.
Chapter 6-1 ALU, Adder and Subtractor
Arithmetic Building Blocks
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Reference: Digital Integrated.
Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.
Chapter 14 Arithmetic Circuits (I): Adder Designs Rev /12/2003
Design of a 32-Bit Hybrid Prefix-Carry Look-Ahead Adder
Nov 10, 2008ECE 561 Lecture 151 Adders. Nov 10, 2008ECE 561 Lecture 152 Adders Basic Ripple Adders Faster Adders Sequential Adders.
Computing Systems Designing a basic ALU.
Spring C:160/55:132 Page 1 Lecture 19 - Computer Arithmetic March 30, 2004 Sukumar Ghosh.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
1 Lecture 12 Time/space trade offs Adders. 2 Time vs. speed: Linear chain 8-input OR function with 2-input gates Gates: 7 Max delay: 7.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
COMP541 Arithmetic Circuits
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Building a Faster Adder
COMP541 Arithmetic Circuits
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 10 Multiplexers MUX: –Selects binary information from one of many input lines and.
EE466: VLSI Design Lecture 13: Adders
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 19: Adder Design
1 Carry Lookahead Logic Carry Generate Gi = Ai Bi must generate carry when A = B = 1 Carry Propagate Pi = Ai xor Bi carry in will equal carry out here.
CPEN Digital System Design
Addition, Subtraction, Logic Operations and ALU Design
CSE477 VLSI Digital Circuits Fall 2002 Lecture 20: Adder Design
Addition and multiplication Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
Topic: N-Bit parallel and Serial adder
ETE 204 – Digital Electronics Combinational Logic Design Single-bit and Multiple-bit Adder Circuits [Lecture: 9] Instructor: Sajib Roy Lecturer, ETE,ULAB.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
Full Adder Truth Table Conjugate Symmetry A B C CARRY SUM
Combinational Circuits
Unit5 Combinational circuit and instrumentation system.
CSE Winter 2001 – Arithmetic Unit - 1
VLSI Arithmetic Adders & Multipliers
Digital Integrated Circuits A Design Perspective
Arithmetic Building Blocks
Presentation transcript:

Spring 2006EE VLSI Design II - © Kia Bazargan 68 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part II: Adders

Spring 2006EE VLSI Design II - © Kia Bazargan 69 References and Copyright Textbooks referenced  [WE92] N. H. E. Weste, K. Eshraghian “Principles of CMOS VLSI Design: A System Perspective ” Addison-Wesley, 2 nd Ed.,  [Rab96] J. M. Rabaey “Digital Integrated Circuits: A Design Perspective ” Prentice Hall,  [Par00] B. Parhami “Computer Arithmetic: Algorithms and Hardware Designs ” Oxford University Press, 2000.

Spring 2006EE VLSI Design II - © Kia Bazargan 70 References and Copyright (cont.) Slides used  [©Hauck] © Scott A. Hauck, ; G. Borriello, C. Ebeling, S. Burns, 1995, University of Washington  [©Prentice Hall] © Prentice Hall 1995, © UCB 1996 Slides for [Rab96]  [ ©Oxford U Press] © Oxford University Press, New York, 2000 Slides for [Par00] With permission from the author

Spring 2006EE VLSI Design II - © Kia Bazargan 71 Outline One-bit adder, basic ripple-carry adder Carry-Lookahead adders (CLA) Manchester carry chain Carry bypass Carry select adder Brent-Kung adder

Spring 2006EE VLSI Design II - © Kia Bazargan 72 Why Adders? Addition: a fundamental operation  Basic block of most arithmetic operations  Address calculation Faster, faster and faster How?  Architectural level optimization  Gate-level optimization  Speed/area trade-off

Spring 2006EE VLSI Design II - © Kia Bazargan 73 One-bit Half Adder: One-bit Full Adder: Adding Two One-bit Operands Sum = A  B  Cin Cout = A.B + B.Cin + A.Cin FA AB C in C out Sum Sum = A  B Cout = A.B HA AB C out Sum A B Sum Cout C in A B Sum Cout

Spring 2006EE VLSI Design II - © Kia Bazargan 74 N-Bit Ripple-Carry Adder: Series of FA Cells To add two n-bit numbers C0C0 FA A0A0 S0S0 B0B0 A1A1 S1S1 B1B1 A2A2 S2S2 B2B2 A n-1 S n-1 B n-1 CnCn... Note: adder delay = Tc * n Tc = (C in :C out delay) FA AB CinCin C ou t Sum

Spring 2006EE VLSI Design II - © Kia Bazargan 75 4-bit Ripple Carry Addition: Example C0C0 FA A0A0 S0S0 B0B0 A1A1 S1S1 B1B1 A2A2 S2S2 B2B2 A3A3 S3S3 B3B3 C4C4 C1C1 C2C2 C3C3 T= T=0 B=0101 A=0011 S=0000 S= T=2 S= T=3 S= T=4 S=1000

Spring 2006EE VLSI Design II - © Kia Bazargan 76 One-bit Full Adder Implementation Direct gate implementation Cout = A.B + B.Cin + A.Cin = A.B + Cin. (A+B) Sum = A  B  Cin A B Cin Sum A B A B Cin Cout 32 Transistors Used [WE92] p516

Spring 2006EE VLSI Design II - © Kia Bazargan 77 includes 111 excludes 000 One-Bit Full Adder: Share Logic An observation  Almost always, sum = NOT carry C in A B Sum Cout Sum = A.B.Cin + (A+B+Cin).Cout

Spring 2006EE VLSI Design II - © Kia Bazargan 78 One-Bit Full Adder: Transistor Implementation Sum = A.B.C + (A+B+C).Cout Cout = A.B + C.(A+B) A B B A C A B AB C Cout C B A A B C C B A C BA Sum –Use inverters to get Cout and Sum –C transistors close to output –Cout delay: 2 inverting stages (1-stage possible?) –Sum delay: 3 inverting stages (not an issue, though) 28 Transistors [WE92] p517 [Rab96] p390

Spring 2006EE VLSI Design II - © Kia Bazargan 79 An observation  Invert inputs => outputs invert Exploit this property:  Get rid of the inverter on the carry critical path One-Bit Full Adder: Inverted Inputs FA C in A B Sum Cout  FA

Spring 2006EE VLSI Design II - © Kia Bazargan 80 Ripple Carry Adder: Inverting Property  FA’ is similar to FA, but with no inverters on the outputs  Much faster (1-stage)  Disadvantage: not regular data path A1A1 S1S1 B1B1 C2C2 C0C0 A0A0 B0B0 S0S0 C1C1 A2A2 B2B2 S2S2 C3C3... FA’ A3A3 S3S3 B3B3 C4C4

Spring 2006EE VLSI Design II - © Kia Bazargan 81 Summary: Ripple-Carry Adder Basic ripple carry: AND-OR gates  Area: 32 transistors (per bit position)  Delay: 2 stages of inverting logic (per bit position) Direct CMOS logic, share Cout’  Area: 28 transistors  Delay: 2 stages Use “inverting” property  Area: 27 (odd bits:26, even bits:28)  Delay: ~1 stage So far: transistor/logic manipulation Is that all we can do?!!

Spring 2006EE VLSI Design II - © Kia Bazargan 82 Outline One-bit adder, basic ripple-carry adder Carry-Lookahead adders (CLA) Manchester carry chain Carry bypass Carry select adder Brent-Kung adder

Spring 2006EE VLSI Design II - © Kia Bazargan 83 Carry-Lookahead Adder: Idea New look: carry propagation Idea:  Try to “predict” C k earlier than T c *k  Instead of passing through k stages, compute C k separately using 1-stage CMOS logic Carry propagation: an example Bit position Carry A B Sum

Spring 2006EE VLSI Design II - © Kia Bazargan 84 0-propagate 1-propagategenerate kill (kill) (propagate) (generate) Carry-Lookahead Adder (CLA): One Bit What happens to the propagating carry in bit position k? C C 1 0 C C C A A B B A A B B Cout [Rab96] p391 p = A+B (or A  B) g = A.B A B C in Cout

Spring 2006EE VLSI Design II - © Kia Bazargan 85 CLA: Propagation Equations If C 4 =1, then either:  g 3 generated at bit pos 3  g 2.p 3 generated at bit pos 2, propagated 3  g 1.p 2.p 3 generated at bit pos 1, propagated 2,3  g 0.p 1.p 2.p 3 generated at bit pos 0, propagated 1,2,3  C in.p 0.p 1.p 2.p 3 input carry, propagated 0,1,2,3 C 4 = g 3 + g 2.p 3 + g 1.p 2.p 3 + g 0.p 1.p 2.p 3 + C in.p 0.p 1.p 2.p 3 Implement C 4 as a one-stage CMOS logic  delay=1 (or is it?)

Spring 2006EE VLSI Design II - © Kia Bazargan 86 p 3.g 2  C 4 p 1.g 2.g 3  C 4 CLA: Static Logic Implementation p0p0 p1p1 p2p2 p3p3 C in g0g0 g1g1 g2g2 g3g3 C4C4 [©Hauck] [Rab96] p405 d e f h j k l m n s r q o t u v w x

Spring 2006EE VLSI Design II - © Kia Bazargan 87 6 transistors in series CLA: Dynamic Logic Implementation Dynamic gate implementation:  C 4 = g 3 + p 3. (g 2 + p 2. (g 1 + p 1. (g 0 + P 0.C in ))) C4C4 C in p0p0 p1p1 p2p2 p3p3 g0g0 g1g1 g2g2 g3g3   [©Hauck] [WE92] p529

Spring 2006EE VLSI Design II - © Kia Bazargan 88 CLA: Dynamic Logic Implementation Can we reuse logic?  Can we get C 1, C 2 and C 3 from the same circuit? C4C4 C in p0p0 p1p1 p2p2 p3p3 g0g0 g1g1 g2g2 g3g3   C1?C1? C2?C2? C3?C3? [©Hauck] No! C1, C2 and C3 may be floating (not precharged) No! C1, C2 and C3 may be floating (not precharged) Charge sharing problem Charge sharing problem No! C1, C2 and C3 may be floating (not precharged) No! C1, C2 and C3 may be floating (not precharged) Charge sharing problem Charge sharing problem

Spring 2006EE VLSI Design II - © Kia Bazargan 89 CLA: Dynamic Logic Implementation [WE92] p529   C1C1 g0g0 p0p0 C in p1p1 g1g1  C2C2 g0g0 p0p0  p1p1 p2p2 g1g1 g2g2  C3C3 g0g0 p0p0  p1p1 p2p2 p3p3 g1g1 g2g2 g3g3  C4C4 g0g0 p0p0 

Spring 2006EE VLSI Design II - © Kia Bazargan 90 CLA: Basic Block (4 Bits) Architecture Block of 4-bit p, g, C out C0C0 A0A0 S0S0 B0B0 A1A1 S1S1 B1B1 A2A2 S2S2 B2B2 A3A3 S3S3 B3B3 p,g p0p0 g0g0 p1p1 g1g1 p2p2 g2g2 p3p3 g3g3 C1C1 C2C2 C3C3 C4C4

Spring 2006EE VLSI Design II - © Kia Bazargan 91 CLA: N-Bit Architecture Put it all together: C0C0 B0B0 A0A0 S0S0 A1A1 S1S1 B1B1 A2A2 S2S2 B2B2 A3A3 S3S3 B3B3 p,g C4C4 A4A4 S4S4 A5A5 S5S5 B5B5 A6A6 S6S6 B6B6 A7A7 S7S7 B7B7 B4B4 C8C8 … … … … Carry Generator

Spring 2006EE VLSI Design II - © Kia Bazargan 92 CLA: 12-Bit Example T= B= A= T= T= T=4

Spring 2006EE VLSI Design II - © Kia Bazargan 93 Summary: Carry Lookahead Adder CLA compared to ripple-carry adder:  Faster (“4 times”?), but delay still linear (w.r.t. # of bits)  Larger area oP, G signal generation oCarry generation circuits oCarry generation ckt for each bit position (no re-use) Limitation: cannot go beyond 4 bits of look-ahead  Large p,g fan-out slows down carry generation Next: Manchester carry chains  Tries to reuse logic by pre-charging each carry position

Spring 2006EE VLSI Design II - © Kia Bazargan 94 Outline One-bit adder, basic ripple-carry adder Carry-Lookahead adders (CLA) Manchester carry chain Carry bypass Carry select adder Brent-Kung adder

Spring 2006EE VLSI Design II - © Kia Bazargan 95 Recap: Carry Look-Ahead Charge sharing problem C4C4 C in p0p0 p1p1 p2p2 p3p3 g0g0 g1g1 g2g2 g3g3   C1?C1? C2?C2? C3?C3?

Spring 2006EE VLSI Design II - © Kia Bazargan 96 C1C1 C2C2 C3C3 Manchester Carry Chain: First Shot Improvement over CLA:  Precharge internal nodes to avoid charge-sharing problem [©Hauck] Fastest way to do small adders –6 transistors on the critical path

Spring 2006EE VLSI Design II - © Kia Bazargan 97 Manchester Carry Chain: Sizing [© Prentice Hall] (“k” is the sizing factor) delay

Spring 2006EE VLSI Design II - © Kia Bazargan 98 Manchester Carry Chain: An Improvement Problem: C in arrives late  move it closer to output  Use bypass logic: C in g0g0 p0p0 g1g1 p1p1 g2g2 p2p2 g3g3 p3p3 C4   p0p0 p1p1 p2p2 p3p3   C in [©Hauck]

Spring 2006EE VLSI Design II - © Kia Bazargan 99 Manchester Carry Chain: the Improvement Direct implementation C in p 0 g 0 p 1 g 1 p 2 g 2 p 3 g 3 C4C4 C 1 C 2 C 3 [©Hauck] p0p0 p1p1 p2p2 p3p3   C in C4C4 C4C4 Carry bypass circuitry Advantages of the carry bypass circuitry –Only 5 series transistors –Less capacitance in internal nodes –C in close to the output

Spring 2006EE VLSI Design II - © Kia Bazargan 100 Manchester Carry Chain: Summary Compared to CLA:  Smaller area oPre-charge internal nodes oReuse logic for intermediate carry signals  C in close to the output Carry chain can be any length  Series propagate is slow (O(n 2 ) delay)  buffer every 4 bits Compact adder: good for up to 16 bits Using carries to compute sum slows down MCC –Use two carry chains: one for sum, one for carry propagation [©Hauck]

Spring 2006EE VLSI Design II - © Kia Bazargan 101 Outline One-bit adder, basic ripple-carry adder Carry-Lookahead adders (CLA) Manchester carry chain Carry bypass Carry select adder Brent-Kung adder

Spring 2006EE VLSI Design II - © Kia Bazargan 102 Carry Bypass Adder: Idea The “bypass” idea is general  Not just for Manchester carry chain  The local carry chain could be “ripple carry adder” CiCi Bit i to i+k Setup Local Carry Chain Sum C i+k+1 Bypass? Structure –Could be static, dynamic, pass transistor –Carry and sum paths shown in different colors –Bypass logic determines: “pass” or “kill/generate”?

Spring 2006EE VLSI Design II - © Kia Bazargan 103 Local Carry Chain Static implementation, using ripple carry adder Dynamic, Manchester (mux=wire!) Carry Bypass Adder: Cell Examples FA p 0.p 1.p 2.p 3 g0g0 g1g1 p1p1 g2g2 p2p2 g3g3 p3p3 C4   p0p0 p1p1 p2p2 p3p3   C in [Rab96] p398 p0p0

Spring 2006EE VLSI Design II - © Kia Bazargan 104 Carry Bypass Adder: Cell Examples (cont.) Static (pass transistor logic), Manchester T 1 =(p 0.p 1.p 2 ).p 3 T 2 =p 3 T 3 =p 0.p 1.p 2.p 3 p0p0 p0p0 p0p0 g0g0 p1p1 p1p1 p1p1 g1g1 p2p2 p2p2 p2p2 g2g2 T2T2 T1T1 T1T1 g3g3 T2T2 T3T3 T3T3 C4C4 C0C0 [WE92] p531

Spring 2006EE VLSI Design II - © Kia Bazargan 105 Carry Bypass Adder: the Structure and Timing Bit 0-3 C0C0 [Rab96] p.399 Setup Local Carry Chain Sum Bit 4-7 Setup Local Carry Chain Sum Bit 8-11 Setup Local Carry Chain Sum Bit Setup Local Carry Chain Sum Timing (Critical path shown in different color): 1-Setup 2-Local carry generate/kill, MUX select line ready 3-C 0 -C 16 carry propagate (if applicable)

Spring 2006EE VLSI Design II - © Kia Bazargan 106 Local Carry Chain Sum Bit 8-11 Setup Local Carry Chain Sum Bit 8-11 Setup For an intermediate stage, after setup:  If in pass mode oLocal carry vector computes intermediate carries (possibly incorrectly) oAt the same time, mux selection set to pass oWhen input carry arrives, intermediate carries might be recomputed oMeanwhile, input carry is sent to Cout Carry Bypass Adder: Timing of a Sub-block Sum Setup –If not pass mode (assume bit 10 generates) Local carry vector computes intermediate carries (bits 10, 11 correc) At the same time, mux selection set to local Meanwhile, output carry is sent to Cout correctly When input carry arrives, intermediate carries C 8 and C 9 (S 8,S 9,S 10 ) will be recomputed correctly Local Carry Chain Sum Local Carry Chain Sum Local Carry Chain

Spring 2006EE VLSI Design II - © Kia Bazargan x t FA + t sum 3 xt mux_pass + max { t select, 4 x t FA } +t setup + Carry Bypass Adder: Timing Bit 0-3 C0C0 Setup Local Carry Chain Sum Bit 4-7 Setup Local Carry Chain Sum Bit 8-11 Setup Local Carry Chain Sum Bit Setup Local Carry Chain Sum Delay =

Spring 2006EE VLSI Design II - © Kia Bazargan 108 Carry Bypass Adder: Pros and Cons Speed:  Faster than ripple adder  Still linear! Area overhead:  Mux (setup?)  Not worth for small adders (N<8)  10-20% for large adders [Rab96] p.399 Propagation Delay Number of bits 4..8 Ripple Adder Bypass Adder

Spring 2006EE VLSI Design II - © Kia Bazargan 109 Outline One-bit adder, basic ripple-carry adder Carry-Lookahead adders (CLA) Manchester carry chain Carry bypass Carry select adder Brent-Kung adder

Spring 2006EE VLSI Design II - © Kia Bazargan 110 Carry Select Adder: the Idea Similar to bypass  Instead of “waiting” for the input carry, ”precompute” the carry output  Compute C i+k for both cases C i =0 and C i =1  When C i arrives, select the appropriate result  Sum computed in one step after the intermediate carry signals are ready [Rab96] p.400 p,g Multiplexers CiCi C i+k Sum Generation Carry Vector Setup (p,g) k bits 0-Carry propagation 1-Carry propagation 1 0

Spring 2006EE VLSI Design II - © Kia Bazargan 111 Linear Carry Select Adder: Structure C0C0 Sum Setup Bits Carry 1-Carry 1 0 C4C4 Sum Setup Bits Carry 1-Carry 1 0 C8C8 Sum Setup Bits Carry 1-Carry 1 0 C 12 Sum Setup Bits Carry 1-Carry 1 0 C 16 [Rab96] p.401

Spring 2006EE VLSI Design II - © Kia Bazargan 112 Linear Carry Select Adder: Timing Setup Bits 0-3 Setup Bits 4-7 Setup Bits 8-11 Setup Bits C0C0 C4C4 Sum C8C8 C 12 Sum 0-Carry 1-Carry Carry 1-Carry Carry 1-Carry Carry 1-Carry 1 0 Sum C 16 Delay = = 7 (16 bits) [Rab96] p.401

Spring 2006EE VLSI Design II - © Kia Bazargan 113 Square Root Carry Select Adder: the Idea Later stages have to wait for the multiplexers in the earlier stages Why not give them bigger chunks of data to compute?  Balances the delay paths  Sub-linear delay (we will see why)

Spring 2006EE VLSI Design II - © Kia Bazargan Square Root Carry Select Adder: the Structure Assuming the following delays:  Setup=1, carry propagate=1/bit, mux=1 C0C0 Sum Bits 0-1 C2C2 Bits 2-4 C5C5 4 Bits 5-8 C9C9 5 Bits 9-13 C 14 6 Bits C 19 7 Delay from all paths = 8 (20 bits) [Rab96] p.402

Spring 2006EE VLSI Design II - © Kia Bazargan 115 Square Root Carry Select Adder: Delay Assume  N-bit adder  P stages (delay directly depends on P)  First stage computes M bits For M<<N (e.g. N=64, M=2)  The first term dominates  N  P 2 /2 

Spring 2006EE VLSI Design II - © Kia Bazargan 116 Carry Select Adder: Trade-offs Area overhead:  An additional carry path and a multiplexer (not the whole adder)  About 30% more than a ripple-carry Delay  Sub-linear (we can beat that too!) Number of bits ripple adder linear select square root select [© Prentice Hall]

Spring 2006EE VLSI Design II - © Kia Bazargan 117 Outline One-bit adder, basic ripple-carry adder Carry-Lookahead adders (CLA) Manchester carry chain Carry bypass Carry select adder Brent-Kung adder

Spring 2006EE VLSI Design II - © Kia Bazargan 118 Binary Carry-Lookahead or Brent-Kung Adder Idea: use binary tree for carry propagation  logarithmic delay A 7 F A 6 A 5 A 4 A 3 A 2 A 1 A 0 A 0 A 1 A 2 A 3 A 4 A 5 A 6 A 7 F t p  log 2 (N) t p  N [© Prentice Hall]

Spring 2006EE VLSI Design II - © Kia Bazargan 119 Brent-Kung Adder Basic component Concatenation MSBLSB g left p left g right p right g p (g, p) g = g left + p left g right p = p left p right (g left, p left )  (g right p right ) [©Hauck]

Spring 2006EE VLSI Design II - © Kia Bazargan 120 No! Doesn’t know about C 0-3 yet! C5?C5? Brent-Kung Adder: Structure Define (Gi, Pi)  generate and propagate for least significant i bits (G 0,P 0 ) = (g 0,p 0 )g i = A i.B i p i = A i  B i for i>0: (G i, P i ) = (g i, p i ) (G i-1, P i-1 ) = (g i, p i ) (g i-1, p i-1 ).... (g 1, p 1 ) Key to Brent-Kung adder – use tree structure to perform concatenations [©Hauck]

Spring 2006EE VLSI Design II - © Kia Bazargan 121 Brent-Kung: the Complete Tree t add  log 2 (N) [© Prentice Hall] (g 0,p 0 ) (g 1,p 1 ) (g 2,p 2 ) (g 3,p 3 ) (g 4,p 4 ) (g 5,p 5 ) (g 6,p 6 ) (g 7,p 7 ) C 0 C 1 C 3 C 7 C 2 C 6 C 5 C 4

Spring 2006EE VLSI Design II - © Kia Bazargan 122 Brent-Kung: Timing [©Oxford U Press] [Par00] p.102 x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 s 11 s 12 s 13 s 14 s Level

Spring 2006EE VLSI Design II - © Kia Bazargan 123 Brent-Kung Adder: Summary Area  On average, twice as large as ripple adder  Layout of the cells is very compact Delay  Logarithmic time  Once carry signals are ready, sum bits derived in const time  Good for wide adders

Spring 2006EE VLSI Design II - © Kia Bazargan 124 Comparing Adder Designs Number of bits Number of bits Brent-Kung select bypass manchester mirror static manchester Brent-Kung select static mirror bypass [© Prentice Hall] t p (sec) Area (mm 2 )

Spring 2006EE VLSI Design II - © Kia Bazargan 125 Combining Different Adders [©Oxford U Press] [Par00] p.103

Spring 2006EE VLSI Design II - © Kia Bazargan 126 Combining Different Adders Two-level carry skip adder  Delay = 8 cycles  Number of bits: 30 Blk EBlock DBlock CBlock BBlock AF Cin t=0 Cout t=8 [©Oxford U Press] [Par00] p.113 c c b bbbbb {8, 1}{7, 2}{6, 3}{5, 4} {4, 5} {3, 8} inout ABC DE F S 2 S 2 S 2 S 2 S 2 T produce T assimilate

Spring 2006EE VLSI Design II - © Kia Bazargan 127 Combining Different Adders 40 Bit Carry Select Adder 24 Bit Differential Carry Lookahead Adder MSBLSB RA(23:0)RB(23:0)RA(63:24)RB(63:24) cout23 64 Bit Adder EA(63:24) EA(23:0) real_add(40:0) hit/miss/data TLB Compare Data Cache Compare © Dan Stasiak, IBM Rochester, 2001

Spring 2006EE VLSI Design II - © Kia Bazargan 128 Combining Different Adders © Dan Stasiak, IBM Rochester, Bit Adder Section 24 Bit Adder Section EA(0:23) & EA_L(0:23) EA(24:63)

Spring 2006EE VLSI Design II - © Kia Bazargan 129 Combining Different Adders Ripple+skip adder: delay=8. Max adder width?  Assume: p,g, ripple, skip signal, skipping: 1 unit delay  Carry signals oPass mode: ready at time x through skip logic  limit # blocks oLocal gen mode: blocks can process y bits and still have time to deliver locally generated carry by time x for the next block.  Sum signals oIf in local generation mode, y is OK oIf in pass mode, y not OK for left bits (e.g., b E receives cin at x=5, can process at most z=3 bits to meet the delay bound of 8 on the sum bits) [©Oxford U Press][Par00] p.112 C out C in b bbbbb ABC DE F SSSSS b G Should appear before slide 126

Spring 2006EE VLSI Design II - © Kia Bazargan 130 CLA Static Logic: Trimmed Down p0p0 C in g0g0 C1C1 [©Hauck] [Rab96] p405 h j k s t u Should appear before slide 86