VLSI Arithmetic Adders & Multipliers

Slides:



Advertisements
Similar presentations
VLSI Arithmetic Adders & Multipliers
Advertisements

1 ECE 4436ECE 5367 Computer Arithmetic I-II. 2 ECE 4436ECE 5367 Addition concepts 1 bit adder –2 inputs for the operands. –Third input – carry in from.
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:
Comparator.
Henry Hexmoor1 Chapter 5 Arithmetic Functions Arithmetic functions –Operate on binary vectors –Use the same subfunction in each bit position Can design.
CSE-221 Digital Logic Design (DLD)
VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Lecture 8 Arithmetic Logic Circuits
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
Introduction to CMOS VLSI Design Lecture 11: Adders David Harris Harvey Mudd College Spring 2004.
VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California
CS1Q Computer Systems Lecture 9 Simon Gay. Lecture 9CS1Q Computer Systems - Simon Gay2 Addition We want to be able to do arithmetic on computers and therefore.
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
Arithmetic Building Blocks
Chapter 14 Arithmetic Circuits (I): Adder Designs Rev /12/2003
Computing Systems Designing a basic ALU.
درس مدارهای منطقی دانشگاه قم مدارهای منطقی محاسباتی تهیه شده توسط حسین امیرخانی مبتنی بر اسلایدهای درس مدارهای منطقی دانشگاه.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
1 Carry Lookahead Logic Carry Generate Gi = Ai Bi must generate carry when A = B = 1 Carry Propagate Pi = Ai xor Bi carry in will equal carry out here.
Lecture #23: Arithmetic Circuits-1 Arithmetic Circuits (Part I) Randy H. Katz University of California, Berkeley Fall 2005.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
CDA3101 Recitation Section 5
Computer Arthmetic Chapter Four P&H.
Combinational Circuits
Somet things you should know about digital arithmetic:
Lecture 12 Logistics Last lecture Today HW4 due today Timing diagrams
Prof. An-Yeu Wu Undergraduate VLSI Course Updated: May 24, 2002
Subtitle: How to design the data path of a processor.
UNIVERSITY OF MASSACHUSETTS Dept
Swamynathan.S.M AP/ECE/SNSCT
Addition and multiplication
Space vs. Speed: Binary Adders
Prof. Vojin G. Oklobdzija
Basics Combinational Circuits Sequential Circuits
Basics Combinational Circuits Sequential Circuits Ahmad Jawdat
VLSI Arithmetic Lecture 5
Computer Organization and Design Arithmetic & Logic Circuits
ECE 331 – Digital System Design
CSE Winter 2001 – Arithmetic Unit - 1
Unsigned Multiplication
VLSI Arithmetic Lecture 4
Lecture 14 Logistics Last lecture Today
Arithmetic Functions & Circuits
Computer Organization and Design Arithmetic & Logic Circuits
VLSI Arithmetic Lecture 10: Multipliers
Arithmetic Circuits (Part I) Randy H
Digital Integrated Circuits A Design Perspective
CS 140 Lecture 14 Standard Combinational Modules
UNIVERSITY OF MASSACHUSETTS Dept
Digital System Design Combinational Logic
Overview Part 1 – Design Procedure Part 2 – Combinational Logic
Part III The Arithmetic/Logic Unit
CSE 140 Lecture 14 Standard Combinational Modules
Addition and multiplication
Lecture 14 Logistics Last lecture Today
Addition and multiplication
ECE 352 Digital System Fundamentals
ECE 352 Digital System Fundamentals
ECE 352 Digital System Fundamentals
ECE 352 Digital System Fundamentals
Lecture 9 Digital VLSI System Design Laboratory
Prof. An-Yeu Wu Undergraduate VLSI Course Updated: May 24, 2002
Prof. An-Yeu Wu Undergraduate VLSI Course Updated: May 24, 2002
Arithmetic Building Blocks
Arithmetic Circuits.
Presentation transcript:

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel

Introduction Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design. The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way. Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation. Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. Oklobdzija 2004 Computer Arithmetic

Basic Operations Addition Multiplication Multiply-Add Division Evaluation of Functions Multi-Media Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. Oklobdzija 2004 Computer Arithmetic

Addition of Binary Numbers

Addition of Binary Numbers Full Adder. The full adder is the fundamental building block of most arithmetic circuits:   The sum and carry outputs are described as: ai bi Cout Full Adder Cin si Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. Oklobdzija 2004 Computer Arithmetic

Addition of Binary Numbers Inputs Outputs ci ai bi si ci+1 1 Propagate Generate Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. Propagate Generate Oklobdzija 2004 Computer Arithmetic

Full-Adder Implementation Full Adder operations is defined by equations: Carry-Propagate: and Carry-Generate gi First we should examine a realization of a one-bit adder which represents a basic building block for all the more elaborate addition schemes. Operation of a Full Adder is defined by the Boolean equations for the sum and carry signals shown in this slide: ai, bi, and ci are the inputs to the i-th full adder stage, and si and ci+1 are the sum and carry outputs from the i-th stage, respectively. From the above equation it is clear that the realization of the Sum function requires two XOR logic gates. The expression for Carry function could be rewritten using the Carry-Propagate pi and Carry-Generate gi terms. If Carry-Propagate is 1, the Carry out of the stage will be equal to the Carry signal into the stage: ci+1 = ci regardless of the carry inside the stage. If Carry-Generate is 1, there will be a Carry signal out of the stage will be 1 regardless of the value of the incoming Carry signal. The logical implementation of the full adder stage is shown in figure (a.) of this slide. This implementation results from a direct application of the logic equations. The implementation (b) is more clever because it utilizes a multiplexer in the carry path. Given that the multiplexer block is often faster than a single gate, using multiplexer in the critical path helps to achieve better performance. One-bit adder could be implemented as shown Oklobdzija 2004 Computer Arithmetic

High-Speed Addition First we should examine a realization of a one-bit adder which represents a basic building block for all the more elaborate addition schemes. Operation of a Full Adder is defined by the Boolean equations for the sum and carry signals shown in this slide: ai, bi, and ci are the inputs to the i-th full adder stage, and si and ci+1 are the sum and carry outputs from the i-th stage, respectively. From the above equation it is clear that the realization of the Sum function requires two XOR logic gates. The expression for Carry function could be rewritten using the Carry-Propagate pi and Carry-Generate gi terms. If Carry-Propagate is 1, the Carry out of the stage will be equal to the Carry signal into the stage: ci+1 = ci regardless of the carry inside the stage. If Carry-Generate is 1, there will be a Carry signal out of the stage will be 1 regardless of the value of the incoming Carry signal. The logical implementation of the full adder stage is shown in figure (a.) of this slide. This implementation results from a direct application of the logic equations. The implementation (b) is more clever because it utilizes a multiplexer in the carry path. Given that the multiplexer block is often faster than a single gate, using multiplexer in the critical path helps to achieve better performance. One-bit adder could be implemented more efficiently because MUX is faster Oklobdzija 2004 Computer Arithmetic

The Ripple-Carry Adder Oklobdzija 2004 Computer Arithmetic

The Ripple-Carry Adder From Rabaey Oklobdzija 2004 Computer Arithmetic

Inversion Property From Rabaey Oklobdzija 2004 Computer Arithmetic

Minimize Critical Path by Reducing Inverting Stages From Rabaey Oklobdzija 2004 Computer Arithmetic

Ripple Carry Adder Critical Path Carry-Chain of an RCA implemented using multiplexer from the standard cell library: Critical Path A ripple carry adder for N-bit numbers is implemented by concatenating N full adders as shown in this slide. At the i-th bit position, the i-th bits of operands A and B and a carry signal from the preceding adder stage are used to generate the i-th bit of the sum, si, and a carry, ci+1, to the next adder stage. This scheme is called a Ripple Carry Adder, since the carry signal “ripple” from the least significant bit position to the most significant one. If the ripple carry adder is implemented by concatenating N full adders, the delay of such an adder is 2N gate delays from Cin-to-Cout. The path from the input to the output signal that is likely to take the longest time is designated as a "critical path". In the case of a Ripple Carry Adder, this is the path from the least significant input a0 or b0 to the last sum bit sn. Assuming multiplexer based XOR gate implementation, this critical path will consist of N+1 pass transistor delays. However, such a long chain of transistors will significantly degrade the signal, thus some amplification points are necessary. In practice, we can use a multiplexer cell to build this critical path using standard cell library as shown in this slide. Oklobdzija, ISCAS’88 Oklobdzija 2004 Computer Arithmetic

Manchester Carry-Chain Realization of the Carry Path Simple and very popular scheme for implementation of carry signal path Manchester Carry Chain is a simple schemes for addition that was very popular at the time of emerging LSI nMOS technology. It is an alternative switch based technique implemented using pass-transistor logic. The speed realized using Manchester Carry Chain is impressive which is due to its simplicity and the properties of the pass-transistor logic. Manchester Carry Chain does not require a large area for its implementation, consuming substantially less power as compared to Carry-Lookahead or other more elaborate schemes. A realization of the Manchester Carry Chain is shown in the slide. Due to the RC delay properties of the Manchester Carry Chain the signal needs to be regenerated by inserting inverters at appropriately chosen locations in the carry chain. Oklobdzija 2004 Computer Arithmetic

Original Design T. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers: A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959. Oklobdzija 2004 Computer Arithmetic

Manchester Carry Chain (CMOS) Implement P with pass-transistors Implement G with pull-up, kill (delete) with pull-down Use dynamic logic to reduce the complexity and speed up Kilburn, et al, IEE Proc, 1959. Oklobdzija 2004 Computer Arithmetic

Pass-Transistor Realization in DPL The ability of pass-transistor logic to provide an efficient multiplexer implementation has been exploited in CPL and DPL logic families. Even an XOR gate is more efficiently implemented using multiplexer topology. A Full-Adder cell which is entirely multiplexer based was published by Hitachi and it is shown in this slide. Such a Full-Adder realization contains only two transistors in the Input-to-Sum path and only one transistor in the Cin-to-Cout path (not counting the buffer). The short critical path is a factor that contributes to a remarkable speed of this implementation. Oklobdzija 2004 Computer Arithmetic

Carry-Skip Adder MacSorley, Proc IRE 1/61 Lehman, Burla, IRE Trans on Comp, 12/61 Oklobdzija 2004 Computer Arithmetic

Carry-Skip Adder Bypass From Rabaey Oklobdzija 2004 Computer Arithmetic

Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups Since the Cin-to-Cout represents the longest path in the ripple-carry-adder an obvious attempt is to accelerate carry propagation through the adder. This is accomplished by using Carry-Propagate pi signals within a group of bits. If all the pi signals within the group are set to pi = 1, the condition exist for the carry to bypass the entire group: Carry Skip Adder divides the words to be added into groups of equal size of k-bits. The basic structure of an N-bit Carry Skip Adder is shown here. Within the group, carry propagates in a ripple-carry fashion. In addition, an AND gate is used to form the group propagate signal. If group propagate signal is “true” the condition exists for carry to bypass, the group as shown in this slide. The maximal delay of a Carry Skip Adder is encountered when carry signal is generated in the least-significant bit position, rippling through k-1 bit positions, skipping over N/k-2 groups in the middle, rippling through the k-1 bits of most significant group and being assimilated in the Nth bit position to produce the sum SN: Thus, Carry Skip Adder is faster than Ripple Carry Adder at the expense of a few relatively simple modifications. The delay of the Carry Skip Adder is still linearly dependent on the size of the adder N, however this linear dependence is reduced by a factor of 1/k. Oklobdzija 2004 Computer Arithmetic

Carry-Skip Adder k Oklobdzija 2004 Computer Arithmetic

Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Computer Arithmetic

Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) The idea behind Variable Block Adder is to minimize the longest critical path in the carry chain of Carry Skip Adder, while allowing the groups to take different sizes. Such optimization in general does not result in an enhanced complexity as compared to the Carry Skip Adder. A carry-chain of a 32-bit Variable Block Adder is shown. The first and the last blocks are smaller, and the intermediate blocks are larger. That compensates for the critical paths originating from the ends by shortening the length of the path used for the carry signal to ripple in the end groups, allowing carry to skip over larger groups in the middle. There are two important consequences of this optimization: First, the total delay is reduced as compared to Carry Skip Adder Second, the delay dependency is not a linear function of the adder size N as in Carry Skip Adder. This dependency follows a square root function of N instead. It is also possible to extend this approach to multiple levels of carry skips which represents a linear programming problem, that does not yield a closed form solution. The speed of such a multiple-level adder surpasses that of fixed group Carry-Lookahead Adder. It also exhibits the lower area and power consumption while retaining its speed. Variable Block Adder has the lowest energy-delay product as compared to the other adders in its class. Oklobdzija 2004 Computer Arithmetic

Any-point-to-any-point delay = 9 D as compared to 12 D for CSKA Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) 6 5 5 4 4 3 D=9 3 1 1 Any-point-to-any-point delay = 9 D as compared to 12 D for CSKA The idea behind Variable Block Adder is to minimize the longest critical path in the carry chain of Carry Skip Adder, while allowing the groups to take different sizes. Such optimization in general does not result in an enhanced complexity as compared to the Carry Skip Adder. A carry-chain of a 32-bit Variable Block Adder is shown. The first and the last blocks are smaller, and the intermediate blocks are larger. That compensates for the critical paths originating from the ends by shortening the length of the path used for the carry signal to ripple in the end groups, allowing carry to skip over larger groups in the middle. There are two important consequences of this optimization: First, the total delay is reduced as compared to Carry Skip Adder Second, the delay dependency is not a linear function of the adder size N as in Carry Skip Adder. This dependency follows a square root function of N instead. It is also possible to extend this approach to multiple levels of carry skips which represents a linear programming problem, that does not yield a closed form solution. The speed of such a multiple-level adder surpasses that of fixed group Carry-Lookahead Adder. It also exhibits the lower area and power consumption while retaining its speed. Variable Block Adder has the lowest energy-delay product as compared to the other adders in its class. Oklobdzija 2004 Computer Arithmetic

Carry-chain block size determination for a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Oklobdzija 2004 Computer Arithmetic

Delay Calculation for Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Delay model: Oklobdzija 2004 Computer Arithmetic

Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Group Length Oklobdzija, Barnes, Arith’85 Oklobdzija 2004 Computer Arithmetic

Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Variable Block Lengths No closed form solution for delay It is a dynamic programming problem Oklobdzija 2004 Computer Arithmetic

Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Computer Arithmetic

Delay Comparison: Variable Block Adder VBA CLA VBA- Multi-Level Oklobdzija 2004 Computer Arithmetic