VLSI Arithmetic Adders & Multipliers

Slides:



Advertisements
Similar presentations
UNIVERSITY OF MASSACHUSETTS Dept
Advertisements

VLSI Arithmetic Adders & Multipliers
Logical Design.
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
EE141 Adder Circuits S. Sundar Kumar Iyer.
Henry Hexmoor1 Chapter 5 Arithmetic Functions Arithmetic functions –Operate on binary vectors –Use the same subfunction in each bit position Can design.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 28: Datapath Subsystems 2/3 Prof. Sherief Reda Division of Engineering,
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California
CSE477 VLSI Digital Circuits Fall 2002 Lecture 20: Adder Design
1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.
Introduction to CMOS VLSI Design Lecture 11: Adders
Introduction to VLSI Circuits and Systems, NCUT 2007 Chapter 12 Arithmetic Circuits in CMOS VLSI Introduction to VLSI Circuits and Systems 積體電路概論 賴秉樑 Dept.
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Shifters. n Adders and ALUs.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Fall 2008EE VLSI Design I - © Kia Bazargan 1 EE 5323 – VLSI Design I Kia Bazargan University of Minnesota Adders.
Lecture 17: Adders.
Lecture 12b: Adders. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 17: Adders 2 Generate / Propagate  Equations often factored into G and P  Generate and.
Copyright 2008 Koren ECE666/Koren Part.5a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Adders. Full-Adder The Binary Adder Express Sum and Carry as a function of P, G, D Define 3 new variable which ONLY depend on A, B Generate (G) = AB.
Lec 17 : ADDERS ece407/507.
Parallel Prefix Adders A Case Study
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 4 – Arithmetic Functions Logic and Computer.
Introduction to CMOS VLSI Design Lecture 11: Adders David Harris Harvey Mudd College Spring 2004.
Bar Ilan University, Engineering Faculty
VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California
VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California
Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer Science and Engineering Depart. University of California,
Chapter 6-1 ALU, Adder and Subtractor
Arithmetic Building Blocks
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Reference: Digital Integrated.
Arithmetic Building Blocks
EECS Components and Design Techniques for Digital Systems Lec 16 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
Chapter 14 Arithmetic Circuits (I): Adder Designs Rev /12/2003
A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri Advanced VLSI Course Presentation University of Tehran December.
Design of a 32-Bit Hybrid Prefix-Carry Look-Ahead Adder
Low-Power and Area-Efficient Carry Select Adder on Reconfigurable Hardware Presented by V.Santhosh kumar, B.Tech,ECE,4 th Year, GITAM University Under.
July 2005Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics n Shifters. n Adders and ALUs.
Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders ECE 645: Lecture 5.
FPGA-Based System Design: Chapter 4 Copyright  2003 Prentice Hall PTR Topics n Number representation. n Shifters. n Adders and ALUs.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Logic and Computer Design.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
Unrolling Carry Recurrence
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Conditional-Sum Adders Parallel Prefix Network Adders
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 10 Multiplexers MUX: –Selects binary information from one of many input lines and.
EE466: VLSI Design Lecture 13: Adders
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 19: Adder Design
Digital Integrated Circuits 2e: Chapter Copyright  2002 Prentice Hall PTR, Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture.
CPEN Digital System Design
Addition, Subtraction, Logic Operations and ALU Design
CSE477 VLSI Digital Circuits Fall 2002 Lecture 20: Adder Design
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
Topic: N-Bit parallel and Serial adder
High Computation Mahendra Sharma. Hybrid number representation The hybrid number representations proposed are capable of bounding the maximum length of.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
Full Adder Truth Table Conjugate Symmetry A B C CARRY SUM
VLSI Arithmetic Lecture 5
VLSI Arithmetic Lecture 4
VLSI Arithmetic Adders & Multipliers
Digital Integrated Circuits A Design Perspective
Arithmetic Building Blocks
Arithmetic Circuits.
Presentation transcript:

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel

Introduction Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design. The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way. Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation. Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. Oklobdzija 2004 Computer Arithmetic

Basic Operations Addition Multiplication Multiply-Add Division Evaluation of Functions Multi-Media Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. Oklobdzija 2004 Computer Arithmetic

Addition of Binary Numbers

Addition of Binary Numbers Full Adder. The full adder is the fundamental building block of most arithmetic circuits:   The sum and carry outputs are described as: ai bi Cout Full Adder Cin si Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. Oklobdzija 2004 Computer Arithmetic

Addition of Binary Numbers Inputs Outputs ci ai bi si ci+1 1 Propagate Generate Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. Propagate Generate Oklobdzija 2004 Computer Arithmetic

Full-Adder Implementation Full Adder operations is defined by equations: Carry-Propagate: and Carry-Generate gi First we should examine a realization of a one-bit adder which represents a basic building block for all the more elaborate addition schemes. Operation of a Full Adder is defined by the Boolean equations for the sum and carry signals shown in this slide: ai, bi, and ci are the inputs to the i-th full adder stage, and si and ci+1 are the sum and carry outputs from the i-th stage, respectively. From the above equation it is clear that the realization of the Sum function requires two XOR logic gates. The expression for Carry function could be rewritten using the Carry-Propagate pi and Carry-Generate gi terms. If Carry-Propagate is 1, the Carry out of the stage will be equal to the Carry signal into the stage: ci+1 = ci regardless of the carry inside the stage. If Carry-Generate is 1, there will be a Carry signal out of the stage will be 1 regardless of the value of the incoming Carry signal. The logical implementation of the full adder stage is shown in figure (a.) of this slide. This implementation results from a direct application of the logic equations. The implementation (b) is more clever because it utilizes a multiplexer in the carry path. Given that the multiplexer block is often faster than a single gate, using multiplexer in the critical path helps to achieve better performance. One-bit adder could be implemented as shown Oklobdzija 2004 Computer Arithmetic

High-Speed Addition First we should examine a realization of a one-bit adder which represents a basic building block for all the more elaborate addition schemes. Operation of a Full Adder is defined by the Boolean equations for the sum and carry signals shown in this slide: ai, bi, and ci are the inputs to the i-th full adder stage, and si and ci+1 are the sum and carry outputs from the i-th stage, respectively. From the above equation it is clear that the realization of the Sum function requires two XOR logic gates. The expression for Carry function could be rewritten using the Carry-Propagate pi and Carry-Generate gi terms. If Carry-Propagate is 1, the Carry out of the stage will be equal to the Carry signal into the stage: ci+1 = ci regardless of the carry inside the stage. If Carry-Generate is 1, there will be a Carry signal out of the stage will be 1 regardless of the value of the incoming Carry signal. The logical implementation of the full adder stage is shown in figure (a.) of this slide. This implementation results from a direct application of the logic equations. The implementation (b) is more clever because it utilizes a multiplexer in the carry path. Given that the multiplexer block is often faster than a single gate, using multiplexer in the critical path helps to achieve better performance. One-bit adder could be implemented more efficiently because MUX is faster Oklobdzija 2004 Computer Arithmetic

The Ripple-Carry Adder Oklobdzija 2004 Computer Arithmetic

The Ripple-Carry Adder From Rabaey Oklobdzija 2004 Computer Arithmetic

Inversion Property From Rabaey Oklobdzija 2004 Computer Arithmetic

Minimize Critical Path by Reducing Inverting Stages From Rabaey Oklobdzija 2004 Computer Arithmetic

Ripple Carry Adder Critical Path Carry-Chain of an RCA implemented using multiplexer from the standard cell library: Critical Path A ripple carry adder for N-bit numbers is implemented by concatenating N full adders as shown in this slide. At the i-th bit position, the i-th bits of operands A and B and a carry signal from the preceding adder stage are used to generate the i-th bit of the sum, si, and a carry, ci+1, to the next adder stage. This scheme is called a Ripple Carry Adder, since the carry signal “ripple” from the least significant bit position to the most significant one. If the ripple carry adder is implemented by concatenating N full adders, the delay of such an adder is 2N gate delays from Cin-to-Cout. The path from the input to the output signal that is likely to take the longest time is designated as a "critical path". In the case of a Ripple Carry Adder, this is the path from the least significant input a0 or b0 to the last sum bit sn. Assuming multiplexer based XOR gate implementation, this critical path will consist of N+1 pass transistor delays. However, such a long chain of transistors will significantly degrade the signal, thus some amplification points are necessary. In practice, we can use a multiplexer cell to build this critical path using standard cell library as shown in this slide. Oklobdzija, ISCAS’88 Oklobdzija 2004 Computer Arithmetic

Manchester Carry-Chain Realization of the Carry Path Simple and very popular scheme for implementation of carry signal path Manchester Carry Chain is a simple schemes for addition that was very popular at the time of emerging LSI nMOS technology. It is an alternative switch based technique implemented using pass-transistor logic. The speed realized using Manchester Carry Chain is impressive which is due to its simplicity and the properties of the pass-transistor logic. Manchester Carry Chain does not require a large area for its implementation, consuming substantially less power as compared to Carry-Lookahead or other more elaborate schemes. A realization of the Manchester Carry Chain is shown in the slide. Due to the RC delay properties of the Manchester Carry Chain the signal needs to be regenerated by inserting inverters at appropriately chosen locations in the carry chain. Oklobdzija 2004 Computer Arithmetic

Original Design T. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers: A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959. Oklobdzija 2004 Computer Arithmetic

Manchester Carry Chain (CMOS) Implement P with pass-transistors Implement G with pull-up, kill (delete) with pull-down Use dynamic logic to reduce the complexity and speed up Kilburn, et al, IEE Proc, 1959. Oklobdzija 2004 Computer Arithmetic

Pass-Transistor Realization in DPL The ability of pass-transistor logic to provide an efficient multiplexer implementation has been exploited in CPL and DPL logic families. Even an XOR gate is more efficiently implemented using multiplexer topology. A Full-Adder cell which is entirely multiplexer based was published by Hitachi and it is shown in this slide. Such a Full-Adder realization contains only two transistors in the Input-to-Sum path and only one transistor in the Cin-to-Cout path (not counting the buffer). The short critical path is a factor that contributes to a remarkable speed of this implementation. Oklobdzija 2004 Computer Arithmetic

Carry-Skip Adder MacSorley, Proc IRE 1/61 Lehman, Burla, IRE Trans on Comp, 12/61 Oklobdzija 2004 Computer Arithmetic

Carry-Skip Adder Bypass From Rabaey Oklobdzija 2004 Computer Arithmetic

Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups Since the Cin-to-Cout represents the longest path in the ripple-carry-adder an obvious attempt is to accelerate carry propagation through the adder. This is accomplished by using Carry-Propagate pi signals within a group of bits. If all the pi signals within the group are set to pi = 1, the condition exist for the carry to bypass the entire group: Carry Skip Adder divides the words to be added into groups of equal size of k-bits. The basic structure of an N-bit Carry Skip Adder is shown here. Within the group, carry propagates in a ripple-carry fashion. In addition, an AND gate is used to form the group propagate signal. If group propagate signal is “true” the condition exists for carry to bypass, the group as shown in this slide. The maximal delay of a Carry Skip Adder is encountered when carry signal is generated in the least-significant bit position, rippling through k-1 bit positions, skipping over N/k-2 groups in the middle, rippling through the k-1 bits of most significant group and being assimilated in the Nth bit position to produce the sum SN: Thus, Carry Skip Adder is faster than Ripple Carry Adder at the expense of a few relatively simple modifications. The delay of the Carry Skip Adder is still linearly dependent on the size of the adder N, however this linear dependence is reduced by a factor of 1/k. Oklobdzija 2004 Computer Arithmetic

Carry-Skip Adder k Oklobdzija 2004 Computer Arithmetic

Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Computer Arithmetic

Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) The idea behind Variable Block Adder is to minimize the longest critical path in the carry chain of Carry Skip Adder, while allowing the groups to take different sizes. Such optimization in general does not result in an enhanced complexity as compared to the Carry Skip Adder. A carry-chain of a 32-bit Variable Block Adder is shown. The first and the last blocks are smaller, and the intermediate blocks are larger. That compensates for the critical paths originating from the ends by shortening the length of the path used for the carry signal to ripple in the end groups, allowing carry to skip over larger groups in the middle. There are two important consequences of this optimization: First, the total delay is reduced as compared to Carry Skip Adder Second, the delay dependency is not a linear function of the adder size N as in Carry Skip Adder. This dependency follows a square root function of N instead. It is also possible to extend this approach to multiple levels of carry skips which represents a linear programming problem, that does not yield a closed form solution. The speed of such a multiple-level adder surpasses that of fixed group Carry-Lookahead Adder. It also exhibits the lower area and power consumption while retaining its speed. Variable Block Adder has the lowest energy-delay product as compared to the other adders in its class. Oklobdzija 2004 Computer Arithmetic

Any-point-to-any-point delay = 9 D as compared to 12 D for CSKA Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) 6 5 5 4 4 3 D=9 3 1 1 Any-point-to-any-point delay = 9 D as compared to 12 D for CSKA The idea behind Variable Block Adder is to minimize the longest critical path in the carry chain of Carry Skip Adder, while allowing the groups to take different sizes. Such optimization in general does not result in an enhanced complexity as compared to the Carry Skip Adder. A carry-chain of a 32-bit Variable Block Adder is shown. The first and the last blocks are smaller, and the intermediate blocks are larger. That compensates for the critical paths originating from the ends by shortening the length of the path used for the carry signal to ripple in the end groups, allowing carry to skip over larger groups in the middle. There are two important consequences of this optimization: First, the total delay is reduced as compared to Carry Skip Adder Second, the delay dependency is not a linear function of the adder size N as in Carry Skip Adder. This dependency follows a square root function of N instead. It is also possible to extend this approach to multiple levels of carry skips which represents a linear programming problem, that does not yield a closed form solution. The speed of such a multiple-level adder surpasses that of fixed group Carry-Lookahead Adder. It also exhibits the lower area and power consumption while retaining its speed. Variable Block Adder has the lowest energy-delay product as compared to the other adders in its class. Oklobdzija 2004 Computer Arithmetic

Carry-chain block size determination for a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Oklobdzija 2004 Computer Arithmetic

Delay Calculation for Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Delay model: Oklobdzija 2004 Computer Arithmetic

Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Group Length Oklobdzija, Barnes, Arith’85 Oklobdzija 2004 Computer Arithmetic

Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Block Lengths No closed form solution for delay It is a dynamic programming problem Oklobdzija 2004 Computer Arithmetic

Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Computer Arithmetic

Delay Comparison: Variable Block Adder VBA CLA VBA- Multi-Level Oklobdzija 2004 Computer Arithmetic

VLSI Arithmetic Lecture 4 Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel

Review Lecture 3

Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Computer Arithmetic

Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) The idea behind Variable Block Adder is to minimize the longest critical path in the carry chain of Carry Skip Adder, while allowing the groups to take different sizes. Such optimization in general does not result in an enhanced complexity as compared to the Carry Skip Adder. A carry-chain of a 32-bit Variable Block Adder is shown. The first and the last blocks are smaller, and the intermediate blocks are larger. That compensates for the critical paths originating from the ends by shortening the length of the path used for the carry signal to ripple in the end groups, allowing carry to skip over larger groups in the middle. There are two important consequences of this optimization: First, the total delay is reduced as compared to Carry Skip Adder Second, the delay dependency is not a linear function of the adder size N as in Carry Skip Adder. This dependency follows a square root function of N instead. It is also possible to extend this approach to multiple levels of carry skips which represents a linear programming problem, that does not yield a closed form solution. The speed of such a multiple-level adder surpasses that of fixed group Carry-Lookahead Adder. It also exhibits the lower area and power consumption while retaining its speed. Variable Block Adder has the lowest energy-delay product as compared to the other adders in its class. Oklobdzija 2004 Computer Arithmetic

Any-point-to-any-point delay = 9 D as compared to 12 D for CSKA Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) 6 5 5 4 4 3 D=9 3 1 1 Any-point-to-any-point delay = 9 D as compared to 12 D for CSKA The idea behind Variable Block Adder is to minimize the longest critical path in the carry chain of Carry Skip Adder, while allowing the groups to take different sizes. Such optimization in general does not result in an enhanced complexity as compared to the Carry Skip Adder. A carry-chain of a 32-bit Variable Block Adder is shown. The first and the last blocks are smaller, and the intermediate blocks are larger. That compensates for the critical paths originating from the ends by shortening the length of the path used for the carry signal to ripple in the end groups, allowing carry to skip over larger groups in the middle. There are two important consequences of this optimization: First, the total delay is reduced as compared to Carry Skip Adder Second, the delay dependency is not a linear function of the adder size N as in Carry Skip Adder. This dependency follows a square root function of N instead. It is also possible to extend this approach to multiple levels of carry skips which represents a linear programming problem, that does not yield a closed form solution. The speed of such a multiple-level adder surpasses that of fixed group Carry-Lookahead Adder. It also exhibits the lower area and power consumption while retaining its speed. Variable Block Adder has the lowest energy-delay product as compared to the other adders in its class. Oklobdzija 2004 Computer Arithmetic

Carry-chain block size determination for a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Oklobdzija 2004 Computer Arithmetic

Delay Calculation for Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Delay model: Oklobdzija 2004 Computer Arithmetic

Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Group Length Oklobdzija, Barnes, Arith’85 Oklobdzija 2004 Computer Arithmetic

Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Block Lengths No closed form solution for delay It is a dynamic programming problem Oklobdzija 2004 Computer Arithmetic

Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Computer Arithmetic

Delay Comparison: Variable Block Adder Square Root Dependency VBA Log Dependency CLA VBA- Multi-Level Oklobdzija 2004 Computer Arithmetic

Circuit Issues Adder speed can not be estimated based on: logic gates in the critical path number of transistors in the path logic levels in the path Estimating Adders speed is much more complex and many of the “fast” schemes may be misleading you. Oklobdzija 2004 Computer Arithmetic

Fan-Out Dependency Oklobdzija 2004 Computer Arithmetic

Fan-In Dependency This looks like “Logical Effort” (1985) Oklobdzija 2004 Computer Arithmetic

Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Computer Arithmetic

Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder (Weinberger and Smith, 1958) ARITH-13: Presenting Achievement Award to Arnold Weinberger of IBM (who invented CLA adder in 1958) Ref: A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”, National Bureau of Standards, Circ. 591, p.3-12, 1958. Oklobdzija 2004 Computer Arithmetic

CLA Definitions: One-bit adder First we should examine a realization of a one-bit adder which represents a basic building block for all the more elaborate addition schemes. Operation of a Full Adder is defined by the Boolean equations for the sum and carry signals shown in this slide: ai, bi, and ci are the inputs to the i-th full adder stage, and si and ci+1 are the sum and carry outputs from the i-th stage, respectively. From the above equation it is clear that the realization of the Sum function requires two XOR logic gates. The expression for Carry function could be rewritten using the Carry-Propagate pi and Carry-Generate gi terms. If Carry-Propagate is 1, the Carry out of the stage will be equal to the Carry signal into the stage: ci+1 = ci regardless of the carry inside the stage. If Carry-Generate is 1, there will be a Carry signal out of the stage will be 1 regardless of the value of the incoming Carry signal. The logical implementation of the full adder stage is shown in figure (a.) of this slide. This implementation results from a direct application of the logic equations. The implementation (b) is more clever because it utilizes a multiplexer in the carry path. Given that the multiplexer block is often faster than a single gate, using multiplexer in the critical path helps to achieve better performance. Oklobdzija 2004 Computer Arithmetic

CLA Definitions: 4-bit Adder Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder: 4-bits Gj Pj Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder One gate delay D to calculate p, g One D to calculate P and two for G Three gate delays To calculate C4(j+1) Compare that to 8 D in RCA ! Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder (Weinberger and Smith)   Additional two gate delays C16 will take a total of 5D vs. 32D for RCA ! Oklobdzija 2004 Computer Arithmetic

32-bit Carry Lookahead Adder A significant speed improvement in the implementation of a parallel adder was introduced by a Carry-Lookahead-Adder developed by Weinberger and Smith in 1958. It is theoretically one of the fastest schemes, since the delay to add two numbers depends on the logarithm of the size of the operands. The Carry Loookahead Adder uses modified full adders for each bit position and Lookahead modules which are used to generate carry signals independently for a group of k-bits. In most common case the group size is 4-bits. In addition to carry signal for the group, Lookahead modules produce group carry generate G and group carry propagate P outputs that indicate that a carry is generated within the group, or that an incoming carry would propagate across the group. The carry out from a 4-bit wide group ci+4 can be computed in four gate delays: one gate delay to compute pi and gi for i = i through i+3, a second gate delay to evaluate Pj, the second and the third to evaluate Gj, and the third and fourth to calculate carry signals ci+1, ci+2 , ci+3 and ci+4. Actually, if not limited by fan-in constraints, ci+4 could be calculated concurrently with Gj and will be available after three gate delays. In a recursive fashion, we can create a "group of groups" or a "super-group". The inputs to the "super-group" are G and P signals from the previous level. The "super-group" produces P* and G* signals indicating that the carry signal will be propagated across, or generated in the groups within the "super-group" domain. A "super-group" produces a carry signal out of the "super-group" as well as an input carry signal for each of the groups in the level above. Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder (Weinberger and Smith: original derivation, 1958 ) Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder (Weinberger and Smith: original derivation ) Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder (Weinberger and Smith) please notice the similarity with Parallel-Prefix Adders ! Oklobdzija 2004 Computer Arithmetic

Carry-Lookahead Adder (Weinberger and Smith) please notice the similarity with Parallel-Prefix Adders ! Oklobdzija 2004 Computer Arithmetic

Motorola: CLA Implementation Example A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS Adder Design”, Proceedings of the IEEE Custom Integrated Circuits Conference, May 3-6, 1992.

Critical path in Motorola's 64-bit CLA 4.8nS 1.05nS 1.7nS As opposed to Ripple or Carry-Skip Adders the critical path in the Carry-Lookahead-Adder travels in vertical direction rather than a horizontal one as shown in the previous slide. Therefore the delay of Carry-Lookahead-Adder is not directly proportional to the size of the adder N, but to the number of levels used. Given that the groups and super-groups in the Carry-Lookahead-Adder resemble a tree structure the delay of a Carry-Lookahead-Adder is thus proportional to the log function of the size N. This log dependency makes Carry-Lookahead-Adder one of the theoretically fastest structures for addition. However, it can be argued that the speed efficiency of the Carry-Lookahead-Adder has passed the point of diminishing returns given the fan-in and fan-out dependencies of the logic gates and inadequacy of the delay model based on counting number of gates in the critical path. In reality, Carry-Lookahead-Adder is indeed achieving lesser speed than expected, especially when compared to some techniques that consume less hardware for the implementation. An example of a Carry Lookahead Adder, and a critical path as implemented in Motorola processor is shown in this slide. 3.75nS 2.7nS 2.0nS 2.35nS Oklobdzija 2004 Computer Arithmetic

Motorola's 64-bit CLA conventional PG Block no better situation here ! carry ripples locally 5-transistors in the path Basically, this is MCC performance with Carry-Skip. One should not expect any better results than VBA. Oklobdzija 2004 Computer Arithmetic

Motorola's 64-bit CLA Modified PG Block Intermediate propagate signals Pi:0 are generated to speed-up C3 still critical path resembles MCC Oklobdzija 2004 Computer Arithmetic

Motorola's 64-bit CLA 1.8nS 2.2nS 2.9nS 3.2nS 3.55nS 3.9nS Oklobdzija 2004 Computer Arithmetic

1.05nS 1.7nS 2.0nS 2.35nS 2.7nS 3.75nS 4.8nS 1.8nS 2.2nS 2.9nS 3.2nS As opposed to Ripple or Carry-Skip Adders the critical path in the Carry-Lookahead-Adder travels in vertical direction rather than a horizontal one as shown in the previous slide. Therefore the delay of Carry-Lookahead-Adder is not directly proportional to the size of the adder N, but to the number of levels used. Given that the groups and super-groups in the Carry-Lookahead-Adder resemble a tree structure the delay of a Carry-Lookahead-Adder is thus proportional to the log function of the size N. This log dependency makes Carry-Lookahead-Adder one of the theoretically fastest structures for addition. However, it can be argued that the speed efficiency of the Carry-Lookahead-Adder has passed the point of diminishing returns given the fan-in and fan-out dependencies of the logic gates and inadequacy of the delay model based on counting number of gates in the critical path. In reality, Carry-Lookahead-Adder is indeed achieving lesser speed than expected, especially when compared to some techniques that consume less hardware for the implementation. An example of a Carry Lookahead Adder, and a critical path as implemented in Motorola processor is shown in this slide. Oklobdzija 2004 Computer Arithmetic

Journal of VLSI Signal Processing, Vol.3, No.4, October 1991 Delay Optimized CLA B. Lee, V. G. Oklobdzija Journal of VLSI Signal Processing, Vol.3, No.4, October 1991

Delay Optimized CLA: Lee-Oklobdzija ‘91 (a.) Fixed groups and levels (b.) variable-sized groups, fixed levels (c.) variable-sized groups and fixed levels (d.) variable-sized groups and levels Oklobdzija 2004 Computer Arithmetic

Two-Levels of Logic Implementation of the Carry Block Oklobdzija 2004 Computer Arithmetic

Two-Levels of Logic Implementation of the Carry-Lookahead Block Oklobdzija 2004 Computer Arithmetic

Three-Levels of Logic Implementation of the Carry Block (restricted fan-in) Oklobdzija 2004 Computer Arithmetic

Three-Levels of Logic Implementation of the Carry Lookahead (restricted fan-in) Oklobdzija 2004 Computer Arithmetic

Delay Optimized CLA: Lee-Oklobdzija ‘91 Delay: Two-level BCLA Delay: Three-level BCLA Oklobdzija 2004 Computer Arithmetic

Delay Optimized CLA: Lee-Oklobdzija ‘91 (a.) 2-level BCLA D=8.5nS (b.) 3-level BCLA D=8.9nS Oklobdzija 2004 Computer Arithmetic

Ling’s Adder Huey Ling, “High-Speed Binary Adder” IBM Journal of Research and Development, Vol.5, No.3, 1981. Used in: IBM 3033, IBM 168, Amdahl V6, HP etc.

Ling’s Derivations define: ai bi ci si ci+1 define: gi implies Ci+1 which implies Hi+1 , thus: gi= gi Hi+1 ai bi pi gi ti 1 Oklobdzija 2004 Computer Arithmetic

Ling’s Derivations From: Now we need to derive Sum equation and because: fundamental expansion Now we need to derive Sum equation Oklobdzija 2004 Computer Arithmetic

Ling Adder Ling’s equations: Variation of CLA: Ling, IBM J. Res. Dev, 5/81 Oklobdzija 2004 Computer Arithmetic

Ling Adder Ling’s equation: Variation of CLA: Ling uses different transfer function. Four of those functions have desired properties (Ling’s is one of them) see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988. Oklobdzija 2004 Computer Arithmetic

Ling Adder Conventional: Ling: Fan-in of 5 Fan-in of 4 Oklobdzija 2004 Computer Arithmetic

Advantages of Ling’s Adder Uniform loading in fan-in and fan-out H16 contains 8 terms as compared to G16 that contains 15. H16 can be implemented with one level of logic (in ECL), while G16 can not. (Ling’s adder takes full advantage of wired-OR, of special importance when ECL technology is used) Oklobdzija 2004 Computer Arithmetic

VLSI Arithmetic Lecture 5 Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel

Review Lecture 4

Ling’s Adder Huey Ling, “High-Speed Binary Adder” IBM Journal of Research and Development, Vol.5, No.3, 1981. Used in: IBM 3033, IBM S370/168, Amdahl V6, HP etc.

Ling’s Derivations define: ai bi ci si ci+1 define: gi implies Ci+1 which implies Hi+1 , thus: gi= gi Hi+1 ai bi pi gi ti 1 Oklobdzija 2004 Computer Arithmetic

Ling’s Derivations From: Now we need to derive Sum equation and because: fundamental expansion Now we need to derive Sum equation Oklobdzija 2004 Computer Arithmetic

Ling Adder Ling’s equations: Variation of CLA: Ling, IBM J. Res. Dev, 5/81 Oklobdzija 2004 Computer Arithmetic

Ling Adder Ling’s equation: Variation of CLA: ai bi ci si ci+1 ai-1 bi-1 ci-1 si-1 gi, ti gi-1, ti-1 Hi+1 Hi Ling uses different transfer function. Four of those functions have desired properties (Ling’s is one of them) see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988. Oklobdzija 2004 Computer Arithmetic

Ling Adder Conventional: Ling: Fan-in of 5 Fan-in of 4 Oklobdzija 2004 Computer Arithmetic

Advantages of Ling’s Adder Uniform loading in fan-in and fan-out H16 contains 8 terms as compared to G16 that contains 15. H16 can be implemented with one level of logic (in ECL), while G16 can not (with 8-way wire-OR). (Ling’s adder takes full advantage of wired-OR, of special importance when ECL technology is used - his IBM limitation was fan-in of 4 and wire-OR of 8) Oklobdzija 2004 Computer Arithmetic

Ling: Weinberger Notes Oklobdzija 2004 Computer Arithmetic

Ling: Weinberger Notes Oklobdzija 2004 Computer Arithmetic

Ling: Weinberger Notes Oklobdzija 2004 Computer Arithmetic

Advantage of Ling’s Adder 32-bit adder used in: IBM 3033, IBM S370/ Model168, Amdahl V6. Implements 32-bit addition in 3 levels of logic Implements 32-bit AGEN: B+Index+Disp in 4 levels of logic (rather than 6) 5 levels of logic for 64-bit adder used in HP processor Oklobdzija 2004 Computer Arithmetic

Implementation of Ling’s Adder in CMOS (S Implementation of Ling’s Adder in CMOS (S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96) Oklobdzija 2004 Computer Arithmetic

S. Naffziger, ISSCC’96 Oklobdzija 2004 Computer Arithmetic

S. Naffziger, ISSCC’96 Oklobdzija 2004 Computer Arithmetic

S. Naffziger, ISSCC’96 Oklobdzija 2004 Computer Arithmetic

S. Naffziger, ISSCC’96 Oklobdzija 2004 Computer Arithmetic

S. Naffziger, ISSCC’96 Oklobdzija 2004 Computer Arithmetic

S. Naffziger, ISSCC’96 Oklobdzija 2004 Computer Arithmetic

S. Naffziger, ISSCC’96 Oklobdzija 2004 Computer Arithmetic

S. Naffziger, ISSCC’96 Oklobdzija 2004 Computer Arithmetic

S. Naffziger, ISSCC’96 Oklobdzija 2004 Computer Arithmetic

S. Naffziger, ISSCC’96 Oklobdzija 2004 Computer Arithmetic

S. Naffziger, ISSCC’96 Oklobdzija 2004 Computer Arithmetic

Ling Adder Critical Path Oklobdzija 2004 Computer Arithmetic

Ling Adder: Circuits Oklobdzija 2004 Computer Arithmetic

LCS4 – Critical G Path Oklobdzija 2004 Computer Arithmetic

LCS4 – Logical Effort Delay Oklobdzija 2004 Computer Arithmetic

See: S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96 Results: 0.5u Technology Speed: 0.930 nS Nominal process, 80C, V=3.3V See: S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96 Oklobdzija 2004 Computer Arithmetic

Prefix Adders and Parallel Prefix Adders

from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Prefix Adders (g, p)o(g’,p’)=(g+pg’, pp’) (g0, p0) Gi, Pi = Following recurrence operation is defined: (g, p)o(g’,p’)=(g+pg’, pp’) such that: (g0, p0) i=0 Gi, Pi = (gi, pi)o(Gi-1, Pi-1 ) 1 ≤ i ≤ n ci+1 = Gi for i=0, 1, ….. n (g-1, p-1)=(cin,cin) c1 = g0+ p0 cin This operation is associative, but not commutative It can also span a range of bits (overlapping and adjacent) Oklobdzija 2004 Computer Arithmetic

from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Pyramid Adder: M. Lehman, “A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic Units”, IFIP Congress, Munich, Germany, 1962. Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Hybrid BK-KS Adder Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: S. Knowles 1999 operation is associative: h>i≥j≥k operation is idempotent: h>i≥j≥k produces carry: cin=0 Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: Ladner-Fisher Exploits associativity, but not idempotency. Produces minimal logical depth Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: Ladner-Fisher (16,8,4,2,1) Two wires at each level. Uniform, fan-in of two. Large fan-out (of 16; n/2); Large capacitive loading combined with the long wires (in the last stages) Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: Kogge-Stone Exploits idempotency to limit the fan-out to 1. Dramatic increase in wires. The wire span remains the same as in Ladner-Fisher. Buffers needed in both cases: K-S, L-F Oklobdzija 2004 Computer Arithmetic

Kogge-Stone Adder Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: Brent-Kung Set the fan-out to one Avoids explosion of wires (as in K-S) Makes no sense in CMOS: fan-out = 1 limit is arbitrary and extreme much of the capacitive load is due to wire (anyway) It is more efficient to insert buffers in L-F than to use B-K scheme Oklobdzija 2004 Computer Arithmetic

Brent-Kung Adder Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: Han-Carlson Is a hybrid synthesis of L-F and K-S Trades increase in logic depth for a reduction in fan-out: effectively a higher-radix variant of K-S. others do it similarly by serializing the prefix computation at the higher fan-out nodes. Others, similarly trade the logical depth for reduction of fan-out and wire. Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities from: Knowles bounded by L-F and K-S at ends Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities Knowles 1999 Following rules are used: Lateral wires at the jth level span 2j bits Lateral fan-out at jth level is power of 2 up to 2j Lateral fan-out at the jth level cannot exceed that a the (j+1)th level. Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities Knowles 1999 The number of minimal depth graphs of this type is given in: at 4-bits there is only K-S and L-F, afterwards there are several new possibilities. Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities Knowles 1999 example of a new 32-bit adder [4,4,2,2,1] Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities Knowles 1999 Example of a new 32-bit adder [4,4,2,2,1] Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities Knowles 1999 Delay is given in terms of FO4 inverter delay: w.c. (nominal case is 40-50% faster) K-S is the fastest K-S adders are wire limited (requiring 80% more area) The difference is less than 15% between examined schemes Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities Knowles 1999 Conclusion Irregular, hybrid schmes are possible The speed-up of 15% is achieved at the cost of large wiring, hence area and power Circuits close in speed to K-S are available at significantly lower wiring cost Oklobdzija 2004 Computer Arithmetic

VLSI Arithmetic Lecture 6 Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel

Review Lecture 5

Prefix Adders and Parallel Prefix Adders

from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Prefix Adders (g, p)o(g’,p’)=(g+pg’, pp’) (g0, p0) Gi, Pi = Following recurrence operation is defined: (g, p)o(g’,p’)=(g+pg’, pp’) such that: (g0, p0) i=0 Gi, Pi = (gi, pi)o(Gi-1, Pi-1 ) 1 ≤ i ≤ n ci+1 = Gi for i=0, 1, ….. n (g-1, p-1)=(cin,cin) c1 = g0+ p0 cin This operation is associative, but not commutative It can also span a range of bits (overlapping and adjacent) Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: S. Knowles 1999 operation is associative: h>i≥j≥k operation is idempotent: h>i≥j≥k produces carry: cin=0 Oklobdzija 2004 Computer Arithmetic

from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Kogge-Stone Adder Oklobdzija 2004 Computer Arithmetic

Brent-Kung Adder Oklobdzija 2004 Computer Arithmetic

Hybrid BK-KS Adder Oklobdzija 2004 Computer Arithmetic

Pyramid Adder: M. Lehman, “A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic Units”, IFIP Congress, Munich, Germany, 1962. Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: Ladner-Fisher Exploits associativity, but not idempotency. Produces minimal logical depth Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: Ladner-Fisher (16,8,4,2,1) Two wires at each level. Uniform, fan-in of two. Large fan-out (of 16; n/2); Large capacitive loading combined with the long wires (in the last stages) Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: Kogge-Stone Exploits idempotency to limit the fan-out to 1. Dramatic increase in wires. The wire span remains the same as in Ladner-Fisher. Buffers needed in both cases: K-S, L-F Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: Brent-Kung Set the fan-out to one Avoids explosion of wires (as in K-S) Makes no sense in CMOS: fan-out = 1 limit is arbitrary and extreme much of the capacitive load is due to wire (anyway) It is more efficient to insert buffers in L-F than to use B-K scheme Oklobdzija 2004 Computer Arithmetic

Two Parallel Prefix Adder Structures Kogge-Stone Han-Carlson log(bits) carry stages Extra Wiring log(bits) + 1 carry stages Reduced Wiring and Gates Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: Han-Carlson Is a hybrid synthesis of L-F and K-S Trades increase in logic depth for a reduction in fan-out: effectively a higher-radix variant of K-S. others do it similarly by serializing the prefix computation at the higher fan-out nodes. Others, similarly trade the logical depth for reduction of fan-out and wire. Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities from: Knowles bounded by L-F and K-S at ends Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities Knowles 1999 Following rules are used: Lateral wires at the jth level span 2j bits Lateral fan-out at jth level is power of 2 up to 2j Lateral fan-out at the jth level cannot exceed that a the (j+1)th level. Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities Knowles 1999 The number of minimal depth graphs of this type is given in: at 4-bits there is only K-S and L-F, afterwards there are several new possibilities. Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities Knowles 1999 example of a new 32-bit adder [4,4,2,2,1] Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities Knowles 1999 Example of a new 32-bit adder [4,4,2,2,1] Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities Knowles 1999 Delay is given in terms of FO4 inverter delay: w.c. (nominal case is 40-50% faster) K-S is the fastest K-S adders are wire limited (requiring 80% more area) The difference is less than 15% between examined schemes Oklobdzija 2004 Computer Arithmetic

Parallel Prefix Adders: variety of possibilities Knowles 1999 Conclusion Irregular, hybrid schmes are possible The speed-up of 15% is achieved at the cost of large wiring, hence area and power Circuits close in speed to K-S are available at significantly lower wiring cost Oklobdzija 2004 Computer Arithmetic

Possibilities for Further Research The logical depth is important (Knowles was right) The fan-out is less important than fan-in (Knowles was wrong): It is possible to examine a variety of topologies with restricted and varied fan-in. Driving strength and Logical Effort rules were overlooked and at least neglected: It is possible to create number of topologies taking LE rules into account. It is further possible to combine the rules with compound domino implementation taking advantage of two different rules governing “dynamic” and “static”. It is still possible to produce a better adder ! Oklobdzija 2004 Computer Arithmetic

Other Types of Adders Oklobdzija 2004 Computer Arithmetic

Conditional Sum Adder J. Sklansky, “Conditional-Sum Addition Logic”, IRE Transactions on Electronic Computers, EC-9, p.226-231, 1960.

Conditional Sum Adder from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

ConditionalSum Adder Oklobdzija 2004 Computer Arithmetic

Conditional Sum Adder from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Conditional Sum Adder from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Conditional Sum Adder Oklobdzija 2004 Computer Arithmetic

Carry-Select Adder O. J. Bedrij, “Carry-Select Adder”, IRE Transactions on Electronic Computers, June 1962, p.340-34

Carry-Select Sum Adder from: Ercegovac-Lang Oklobdzija 2004 Computer Arithmetic

Carry-Select Adder Addition under assumption of Cin=0 and Cin =1. The theoretically fastest scheme for addition of two numbers is "Conditional-Sum Addition" proposed by Sklansky in 1960. The essence of this scheme is in the realization that we can add two numbers without waiting for the carry signal to arrive. Simply, the numbers are added in two instances: one assuming Cin = 0 and the other assuming Cin = 1. The conditionally produced results: Sum0, Sum1 and Carry0, Carry1 are selected by a multiplexer using an incoming carry signal Cin as a multiplexer control. Similarly to the Carry-Lookahead Adder the input bits are divided into groups which are in this case added "conditionally". It is apparent that while building Conditional-Sum Adder the hardware complexity starts to grow rapidly starting from the Least Significant Bit position. Therefore, in practice, the full-blown implementation of the CNSA is not found. However, the idea of adding the Most Significant portion of the operands conditionally and selecting the results once the carry-in signal is computed in the Least Significant portion, is attractive. Such a scheme, which is a subset of Conditional-Sum Adder, is known as "Carry-Select Adder".   Carry Select Adder divides the words to be added into blocks and forms two sums for each block in parallel: -one with a carry in of ZERO and the other with a carry in of ONE. In this slide an example of a 16 bit carry select adder in shown: The carry-out from the Least Significant 4-bit block controls a multiplexer that selects the sum from the Most Significant portion. The carry out is computed using the equation for the carry out of the group, since the group propagate signal Pi is the carry out of an adder with a carry input of ONE and the group generate Gi signal is the carry out of an adder with a carry input of ZERO. This speeds-up the computation of the carry signal which is necessary for selection in the next block. The upper 8-bits are computed conditionally using two Carry-Select Adders similar to the one used in the Least Significant 8-bit portion. The delay of this adder is determined by the speed of the Least Significant k-bit block (4-bit RCA in this example) and delay of multiplexers in the Most Significant path. Generally the delay of such adder is proportional to the log function of the size of the adder. Oklobdzija 2004 Computer Arithmetic

Carry Select Adder: combining two 32-b VBAs in select mode Delay =DVBA32+ DMUX Oklobdzija 2004 Computer Arithmetic

O.J. Bedrij, IBM Poughkeepsie, 1962 Carry-Select Adder O.J. Bedrij, IBM Poughkeepsie, 1962 Oklobdzija 2004 Computer Arithmetic