VLSI Arithmetic Lecture 4

Slides:



Advertisements
Similar presentations
UNIVERSITY OF MASSACHUSETTS Dept
Advertisements

VLSI Arithmetic Adders & Multipliers
1 ECE 4436ECE 5367 Computer Arithmetic I-II. 2 ECE 4436ECE 5367 Addition concepts 1 bit adder –2 inputs for the operands. –Third input – carry in from.
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:
1 Lecture 12: Hardware for Arithmetic Today’s topics:  Designing an ALU  Carry-lookahead adder Reminder: Assignment 5 will be posted in a couple of days.
Comparator.
Henry Hexmoor1 Chapter 5 Arithmetic Functions Arithmetic functions –Operate on binary vectors –Use the same subfunction in each bit position Can design.
CSE-221 Digital Logic Design (DLD)
VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California
ECE C03 Lecture 61 Lecture 6 Arithmetic Logic Circuits Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Lecture 8 Arithmetic Logic Circuits
VLSI Arithmetic Adders & Multipliers
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
Chapter 4 – Arithmetic Functions and HDLs Logic and Computer Design Fundamentals.
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
Design of a 32-Bit Hybrid Prefix-Carry Look-Ahead Adder
Computing Systems Designing a basic ALU.
COE 202: Digital Logic Design Combinational Circuits Part 2 KFUPM Courtesy of Dr. Ahmad Almulhem.
1 Lecture 12 Time/space trade offs Adders. 2 Time vs. speed: Linear chain 8-input OR function with 2-input gates Gates: 7 Max delay: 7.
1 Carry Lookahead Logic Carry Generate Gi = Ai Bi must generate carry when A = B = 1 Carry Propagate Pi = Ai xor Bi carry in will equal carry out here.
Lecture #23: Arithmetic Circuits-1 Arithmetic Circuits (Part I) Randy H. Katz University of California, Berkeley Fall 2005.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
Carry-Lookahead & Carry-Select Adders
Somet things you should know about digital arithmetic:
Lecture 12 Logistics Last lecture Today HW4 due today Timing diagrams
Subtitle: How to design the data path of a processor.
UNIVERSITY OF MASSACHUSETTS Dept
Lecture Adders Half adder.
Swamynathan.S.M AP/ECE/SNSCT
Space vs. Speed: Binary Adders
Digital Systems Section 8 Multiplexers. Digital Systems Section 8 Multiplexers.
Summary Half-Adder Basic rules of binary addition are performed by a half adder, which has two binary inputs (A and B) and two binary outputs (Carry out.
Basics Combinational Circuits Sequential Circuits
VLSI Arithmetic Lecture 5
Topics Number representation. Shifters. Adders and ALUs.
Combinational Circuits
ECE 331 – Digital System Design
CSE Winter 2001 – Arithmetic Unit - 1
Unsigned Multiplication
VLSI Arithmetic Adders & Multipliers
Lecture 14 Logistics Last lecture Today
King Fahd University of Petroleum and Minerals
Arithmetic Functions & Circuits
VLSI Arithmetic Lecture 10: Multipliers
Arithmetic Circuits (Part I) Randy H
EFFICIENT ADDERS TO SPEEDUP MODULAR MULTIPLICATION FOR CRYPTOGRAPHY
Instructor: Prof. Chung-Kuan Cheng
Instructor: Alexander Stoytchev
Digital Systems Section 12 Binary Adders. Digital Systems Section 12 Binary Adders.
UNIVERSITY OF MASSACHUSETTS Dept
Digital System Design Combinational Logic
Overview Part 1 – Design Procedure Part 2 – Combinational Logic
Part III The Arithmetic/Logic Unit
Addition and multiplication
Instructor: Alexander Stoytchev
Lecture 14 Logistics Last lecture Today
Addition and multiplication
ECE 352 Digital System Fundamentals
Instructor: Alexander Stoytchev
ECE 352 Digital System Fundamentals
ECE 352 Digital System Fundamentals
ECE 352 Digital System Fundamentals
Lecture 9 Digital VLSI System Design Laboratory
Presentation transcript:

VLSI Arithmetic Lecture 4 Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel

Review Lecture 3

Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Computer Arithmetic

Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) The idea behind Variable Block Adder is to minimize the longest critical path in the carry chain of Carry Skip Adder, while allowing the groups to take different sizes. Such optimization in general does not result in an enhanced complexity as compared to the Carry Skip Adder. A carry-chain of a 32-bit Variable Block Adder is shown. The first and the last blocks are smaller, and the intermediate blocks are larger. That compensates for the critical paths originating from the ends by shortening the length of the path used for the carry signal to ripple in the end groups, allowing carry to skip over larger groups in the middle. There are two important consequences of this optimization: First, the total delay is reduced as compared to Carry Skip Adder Second, the delay dependency is not a linear function of the adder size N as in Carry Skip Adder. This dependency follows a square root function of N instead. It is also possible to extend this approach to multiple levels of carry skips which represents a linear programming problem, that does not yield a closed form solution. The speed of such a multiple-level adder surpasses that of fixed group Carry-Lookahead Adder. It also exhibits the lower area and power consumption while retaining its speed. Variable Block Adder has the lowest energy-delay product as compared to the other adders in its class. Oklobdzija 2004 Computer Arithmetic

Any-point-to-any-point delay = 9 D as compared to 12 D for CSKA Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) 6 5 5 4 4 3 D=9 3 1 1 Any-point-to-any-point delay = 9 D as compared to 12 D for CSKA The idea behind Variable Block Adder is to minimize the longest critical path in the carry chain of Carry Skip Adder, while allowing the groups to take different sizes. Such optimization in general does not result in an enhanced complexity as compared to the Carry Skip Adder. A carry-chain of a 32-bit Variable Block Adder is shown. The first and the last blocks are smaller, and the intermediate blocks are larger. That compensates for the critical paths originating from the ends by shortening the length of the path used for the carry signal to ripple in the end groups, allowing carry to skip over larger groups in the middle. There are two important consequences of this optimization: First, the total delay is reduced as compared to Carry Skip Adder Second, the delay dependency is not a linear function of the adder size N as in Carry Skip Adder. This dependency follows a square root function of N instead. It is also possible to extend this approach to multiple levels of carry skips which represents a linear programming problem, that does not yield a closed form solution. The speed of such a multiple-level adder surpasses that of fixed group Carry-Lookahead Adder. It also exhibits the lower area and power consumption while retaining its speed. Variable Block Adder has the lowest energy-delay product as compared to the other adders in its class. Oklobdzija 2004 Computer Arithmetic

Carry-chain block size determination for a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Oklobdzija 2004 Computer Arithmetic

Delay Calculation for Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Delay model: Oklobdzija 2004 Computer Arithmetic

Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Group Length Oklobdzija, Barnes, Arith’85 Oklobdzija 2004 Computer Arithmetic

Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Block Lengths No closed form solution for delay It is a dynamic programming problem Oklobdzija 2004 Computer Arithmetic

Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Computer Arithmetic

Delay Comparison: Variable Block Adder Square Root Dependency VBA Log Dependency CLA VBA- Multi-Level Oklobdzija 2004 Computer Arithmetic

Circuit Issues Adder speed can not be estimated based on: logic gates in the critical path number of transistors in the path logic levels in the path Estimating Adders speed is much more complex and many of the “fast” schemes may be misleading you. Oklobdzija 2004 Computer Arithmetic

Fan-Out Dependency Oklobdzija 2004 Computer Arithmetic

Fan-In Dependency This looks like “Logical Effort” (1985) Oklobdzija 2004 Computer Arithmetic

Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Computer Arithmetic

Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder (Weinberger and Smith, 1958) ARITH-13: Presenting Achievement Award to Arnold Weinberger of IBM (who invented CLA adder in 1958) Ref: A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”, National Bureau of Standards, Circ. 591, p.3-12, 1958. Oklobdzija 2004 Computer Arithmetic

CLA Definitions: One-bit adder First we should examine a realization of a one-bit adder which represents a basic building block for all the more elaborate addition schemes. Operation of a Full Adder is defined by the Boolean equations for the sum and carry signals shown in this slide: ai, bi, and ci are the inputs to the i-th full adder stage, and si and ci+1 are the sum and carry outputs from the i-th stage, respectively. From the above equation it is clear that the realization of the Sum function requires two XOR logic gates. The expression for Carry function could be rewritten using the Carry-Propagate pi and Carry-Generate gi terms. If Carry-Propagate is 1, the Carry out of the stage will be equal to the Carry signal into the stage: ci+1 = ci regardless of the carry inside the stage. If Carry-Generate is 1, there will be a Carry signal out of the stage will be 1 regardless of the value of the incoming Carry signal. The logical implementation of the full adder stage is shown in figure (a.) of this slide. This implementation results from a direct application of the logic equations. The implementation (b) is more clever because it utilizes a multiplexer in the carry path. Given that the multiplexer block is often faster than a single gate, using multiplexer in the critical path helps to achieve better performance. Oklobdzija 2004 Computer Arithmetic

CLA Definitions: 4-bit Adder Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder: 4-bits Gj Pj Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder One gate delay D to calculate p, g One D to calculate P and two for G Three gate delays To calculate C4(j+1) Compare that to 8 D in RCA ! Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder (Weinberger and Smith)   Additional two gate delays C16 will take a total of 5D vs. 32D for RCA ! Oklobdzija 2004 Computer Arithmetic

32-bit Carry Lookahead Adder A significant speed improvement in the implementation of a parallel adder was introduced by a Carry-Lookahead-Adder developed by Weinberger and Smith in 1958. It is theoretically one of the fastest schemes, since the delay to add two numbers depends on the logarithm of the size of the operands. The Carry Loookahead Adder uses modified full adders for each bit position and Lookahead modules which are used to generate carry signals independently for a group of k-bits. In most common case the group size is 4-bits. In addition to carry signal for the group, Lookahead modules produce group carry generate G and group carry propagate P outputs that indicate that a carry is generated within the group, or that an incoming carry would propagate across the group. The carry out from a 4-bit wide group ci+4 can be computed in four gate delays: one gate delay to compute pi and gi for i = i through i+3, a second gate delay to evaluate Pj, the second and the third to evaluate Gj, and the third and fourth to calculate carry signals ci+1, ci+2 , ci+3 and ci+4. Actually, if not limited by fan-in constraints, ci+4 could be calculated concurrently with Gj and will be available after three gate delays. In a recursive fashion, we can create a "group of groups" or a "super-group". The inputs to the "super-group" are G and P signals from the previous level. The "super-group" produces P* and G* signals indicating that the carry signal will be propagated across, or generated in the groups within the "super-group" domain. A "super-group" produces a carry signal out of the "super-group" as well as an input carry signal for each of the groups in the level above. Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder (Weinberger and Smith: original derivation, 1958 ) Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder (Weinberger and Smith: original derivation ) Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder (Weinberger and Smith) please notice the similarity with Parallel-Prefix Adders ! Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder (Weinberger and Smith) please notice the similarity with Parallel-Prefix Adders ! Oklobdzija 2004 Computer Arithmetic

Motorola: CLA Implementation Example A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS Adder Design”, Proceedings of the IEEE Custom Integrated Circuits Conference, May 3-6, 1992.

Critical path in Motorola's 64-bit CLA 4.8nS 1.05nS 1.7nS As opposed to Ripple or Carry-Skip Adders the critical path in the Carry-Lookahead-Adder travels in vertical direction rather than a horizontal one as shown in the previous slide. Therefore the delay of Carry-Lookahead-Adder is not directly proportional to the size of the adder N, but to the number of levels used. Given that the groups and super-groups in the Carry-Lookahead-Adder resemble a tree structure the delay of a Carry-Lookahead-Adder is thus proportional to the log function of the size N. This log dependency makes Carry-Lookahead-Adder one of the theoretically fastest structures for addition. However, it can be argued that the speed efficiency of the Carry-Lookahead-Adder has passed the point of diminishing returns given the fan-in and fan-out dependencies of the logic gates and inadequacy of the delay model based on counting number of gates in the critical path. In reality, Carry-Lookahead-Adder is indeed achieving lesser speed than expected, especially when compared to some techniques that consume less hardware for the implementation. An example of a Carry Lookahead Adder, and a critical path as implemented in Motorola processor is shown in this slide. 3.75nS 2.7nS 2.0nS 2.35nS Oklobdzija 2004 Computer Arithmetic

Motorola's 64-bit CLA conventional PG Block no better situation here ! carry ripples locally 5-transistors in the path Basically, this is MCC performance with Carry-Skip. One should not expect any better results than VBA. Oklobdzija 2004 Computer Arithmetic

Motorola's 64-bit CLA Modified PG Block Intermediate propagate signals Pi:0 are generated to speed-up C3 still critical path resembles MCC Oklobdzija 2004 Computer Arithmetic

Motorola's 64-bit CLA 1.8nS 2.2nS 2.9nS 3.2nS 3.55nS 3.9nS Oklobdzija 2004 Computer Arithmetic

1.05nS 1.7nS 2.0nS 2.35nS 2.7nS 3.75nS 4.8nS 1.8nS 2.2nS 2.9nS 3.2nS As opposed to Ripple or Carry-Skip Adders the critical path in the Carry-Lookahead-Adder travels in vertical direction rather than a horizontal one as shown in the previous slide. Therefore the delay of Carry-Lookahead-Adder is not directly proportional to the size of the adder N, but to the number of levels used. Given that the groups and super-groups in the Carry-Lookahead-Adder resemble a tree structure the delay of a Carry-Lookahead-Adder is thus proportional to the log function of the size N. This log dependency makes Carry-Lookahead-Adder one of the theoretically fastest structures for addition. However, it can be argued that the speed efficiency of the Carry-Lookahead-Adder has passed the point of diminishing returns given the fan-in and fan-out dependencies of the logic gates and inadequacy of the delay model based on counting number of gates in the critical path. In reality, Carry-Lookahead-Adder is indeed achieving lesser speed than expected, especially when compared to some techniques that consume less hardware for the implementation. An example of a Carry Lookahead Adder, and a critical path as implemented in Motorola processor is shown in this slide. Oklobdzija 2004 Computer Arithmetic

Journal of VLSI Signal Processing, Vol.3, No.4, October 1991 Delay Optimized CLA B. Lee, V. G. Oklobdzija Journal of VLSI Signal Processing, Vol.3, No.4, October 1991

Delay Optimized CLA: Lee-Oklobdzija ‘91 (a.) Fixed groups and levels (b.) variable-sized groups, fixed levels (c.) variable-sized groups and fixed levels (d.) variable-sized groups and levels Oklobdzija 2004 Computer Arithmetic

Two-Levels of Logic Implementation of the Carry Block Oklobdzija 2004 Computer Arithmetic

Two-Levels of Logic Implementation of the Carry-Lookahead Block Oklobdzija 2004 Computer Arithmetic

Three-Levels of Logic Implementation of the Carry Block (restricted fan-in) Oklobdzija 2004 Computer Arithmetic

Three-Levels of Logic Implementation of the Carry Lookahead (restricted fan-in) Oklobdzija 2004 Computer Arithmetic

Delay Optimized CLA: Lee-Oklobdzija ‘91 Delay: Two-level BCLA Delay: Three-level BCLA Oklobdzija 2004 Computer Arithmetic

Delay Optimized CLA: Lee-Oklobdzija ‘91 (a.) 2-level BCLA D=8.5nS (b.) 3-level BCLA D=8.9nS Oklobdzija 2004 Computer Arithmetic

Ling’s Adder Huey Ling, “High-Speed Binary Adder” IBM Journal of Research and Development, Vol.5, No.3, 1981. Used in: IBM 3033, IBM 168, Amdahl V6, HP etc.

Ling’s Derivations define: ai bi ci si ci+1 define: gi implies Ci+1 which implies Hi+1 , thus: gi= gi Hi+1 ai bi pi gi ti 1 Oklobdzija 2004 Computer Arithmetic

Ling’s Derivations From: Now we need to derive Sum equation and because: fundamental expansion Now we need to derive Sum equation Oklobdzija 2004 Computer Arithmetic

Ling Adder Ling’s equations: Variation of CLA: Ling, IBM J. Res. Dev, 5/81 Oklobdzija 2004 Computer Arithmetic

Ling Adder Ling’s equation: Variation of CLA: Ling uses different transfer function. Four of those functions have desired properties (Ling’s is one of them) see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988. Oklobdzija 2004 Computer Arithmetic

Ling Adder Conventional: Ling: Fan-in of 5 Fan-in of 4 Oklobdzija 2004 Computer Arithmetic

Advantages of Ling’s Adder Uniform loading in fan-in and fan-out H16 contains 8 terms as compared to G16 that contains 15. H16 can be implemented with one level of logic (in ECL), while G16 can not. (Ling’s adder takes full advantage of wired-OR, of special importance when ECL technology is used) Oklobdzija 2004 Computer Arithmetic