VLSI Arithmetic Adders & Multipliers

VLSI Arithmetic Adders & Multipliers
Prof. Vojin G. Oklobdzija University of California

Introduction Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design. The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way. Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation. Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. Oklobdzija 2004 Computer Arithmetic

Basic Operations Addition Multiplication Multiply-Add Division
Evaluation of Functions Multi-Media Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. Oklobdzija 2004 Computer Arithmetic

Addition of Binary Numbers

Full Adder. The full adder is the fundamental building block of most arithmetic circuits: The sum and carry outputs are described as: ai bi Cout Full Adder Cin si Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. Oklobdzija 2004 Computer Arithmetic

Inputs Outputs ci ai bi si ci+1 1 Propagate Generate Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. Given that the hardware can only perform relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. Propagate Generate Oklobdzija 2004 Computer Arithmetic

Full-Adder Implementation
Full Adder operations is defined by equations: Carry-Propagate: and Carry-Generate gi First we should examine a realization of a one-bit adder which represents a basic building block for all the more elaborate addition schemes. Operation of a Full Adder is defined by the Boolean equations for the sum and carry signals shown in this slide: ai, bi, and ci are the inputs to the i-th full adder stage, and si and ci+1 are the sum and carry outputs from the i-th stage, respectively. From the above equation it is clear that the realization of the Sum function requires two XOR logic gates. The expression for Carry function could be rewritten using the Carry-Propagate pi and Carry-Generate gi terms. If Carry-Propagate is 1, the Carry out of the stage will be equal to the Carry signal into the stage: ci+1 = ci regardless of the carry inside the stage. If Carry-Generate is 1, there will be a Carry signal out of the stage will be 1 regardless of the value of the incoming Carry signal. The logical implementation of the full adder stage is shown in figure (a.) of this slide. This implementation results from a direct application of the logic equations. The implementation (b) is more clever because it utilizes a multiplexer in the carry path. Given that the multiplexer block is often faster than a single gate, using multiplexer in the critical path helps to achieve better performance. One-bit adder could be implemented as shown Oklobdzija 2004 Computer Arithmetic

High-Speed Addition First we should examine a realization of a one-bit adder which represents a basic building block for all the more elaborate addition schemes. Operation of a Full Adder is defined by the Boolean equations for the sum and carry signals shown in this slide: ai, bi, and ci are the inputs to the i-th full adder stage, and si and ci+1 are the sum and carry outputs from the i-th stage, respectively. From the above equation it is clear that the realization of the Sum function requires two XOR logic gates. The expression for Carry function could be rewritten using the Carry-Propagate pi and Carry-Generate gi terms. If Carry-Propagate is 1, the Carry out of the stage will be equal to the Carry signal into the stage: ci+1 = ci regardless of the carry inside the stage. If Carry-Generate is 1, there will be a Carry signal out of the stage will be 1 regardless of the value of the incoming Carry signal. The logical implementation of the full adder stage is shown in figure (a.) of this slide. This implementation results from a direct application of the logic equations. The implementation (b) is more clever because it utilizes a multiplexer in the carry path. Given that the multiplexer block is often faster than a single gate, using multiplexer in the critical path helps to achieve better performance. One-bit adder could be implemented more efficiently because MUX is faster Oklobdzija 2004 Computer Arithmetic

The Ripple-Carry Adder
Oklobdzija 2004 Computer Arithmetic

The Ripple-Carry Adder
From Rabaey Oklobdzija 2004 Computer Arithmetic

Inversion Property From Rabaey Oklobdzija 2004 Computer Arithmetic

Minimize Critical Path by Reducing Inverting Stages
From Rabaey Oklobdzija 2004 Computer Arithmetic

Ripple Carry Adder Critical Path
Carry-Chain of an RCA implemented using multiplexer from the standard cell library: Critical Path A ripple carry adder for N-bit numbers is implemented by concatenating N full adders as shown in this slide. At the i-th bit position, the i-th bits of operands A and B and a carry signal from the preceding adder stage are used to generate the i-th bit of the sum, si, and a carry, ci+1, to the next adder stage. This scheme is called a Ripple Carry Adder, since the carry signal “ripple” from the least significant bit position to the most significant one. If the ripple carry adder is implemented by concatenating N full adders, the delay of such an adder is 2N gate delays from Cin-to-Cout. The path from the input to the output signal that is likely to take the longest time is designated as a "critical path". In the case of a Ripple Carry Adder, this is the path from the least significant input a0 or b0 to the last sum bit sn. Assuming multiplexer based XOR gate implementation, this critical path will consist of N+1 pass transistor delays. However, such a long chain of transistors will significantly degrade the signal, thus some amplification points are necessary. In practice, we can use a multiplexer cell to build this critical path using standard cell library as shown in this slide. Oklobdzija, ISCAS’88 Oklobdzija 2004 Computer Arithmetic

Manchester Carry-Chain Realization of the Carry Path
Simple and very popular scheme for implementation of carry signal path Manchester Carry Chain is a simple schemes for addition that was very popular at the time of emerging LSI nMOS technology. It is an alternative switch based technique implemented using pass-transistor logic. The speed realized using Manchester Carry Chain is impressive which is due to its simplicity and the properties of the pass-transistor logic. Manchester Carry Chain does not require a large area for its implementation, consuming substantially less power as compared to Carry-Lookahead or other more elaborate schemes. A realization of the Manchester Carry Chain is shown in the slide. Due to the RC delay properties of the Manchester Carry Chain the signal needs to be regenerated by inserting inverters at appropriately chosen locations in the carry chain. Oklobdzija 2004 Computer Arithmetic

Original Design T. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers: A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959. Oklobdzija 2004 Computer Arithmetic

Manchester Carry Chain (CMOS)
Implement P with pass-transistors Implement G with pull-up, kill (delete) with pull-down Use dynamic logic to reduce the complexity and speed up Kilburn, et al, IEE Proc, 1959. Oklobdzija 2004 Computer Arithmetic

Pass-Transistor Realization in DPL
The ability of pass-transistor logic to provide an efficient multiplexer implementation has been exploited in CPL and DPL logic families. Even an XOR gate is more efficiently implemented using multiplexer topology. A Full-Adder cell which is entirely multiplexer based was published by Hitachi and it is shown in this slide. Such a Full-Adder realization contains only two transistors in the Input-to-Sum path and only one transistor in the Cin-to-Cout path (not counting the buffer). The short critical path is a factor that contributes to a remarkable speed of this implementation. Oklobdzija 2004 Computer Arithmetic

Carry-Skip Adder MacSorley, Proc IRE 1/61
Lehman, Burla, IRE Trans on Comp, 12/61 Oklobdzija 2004 Computer Arithmetic

Carry-Skip Adder Bypass From Rabaey Oklobdzija 2004
Computer Arithmetic

Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups
Since the Cin-to-Cout represents the longest path in the ripple-carry-adder an obvious attempt is to accelerate carry propagation through the adder. This is accomplished by using Carry-Propagate pi signals within a group of bits. If all the pi signals within the group are set to pi = 1, the condition exist for the carry to bypass the entire group: Carry Skip Adder divides the words to be added into groups of equal size of k-bits. The basic structure of an N-bit Carry Skip Adder is shown here. Within the group, carry propagates in a ripple-carry fashion. In addition, an AND gate is used to form the group propagate signal. If group propagate signal is “true” the condition exists for carry to bypass, the group as shown in this slide. The maximal delay of a Carry Skip Adder is encountered when carry signal is generated in the least-significant bit position, rippling through k-1 bit positions, skipping over N/k-2 groups in the middle, rippling through the k-1 bits of most significant group and being assimilated in the Nth bit position to produce the sum SN: Thus, Carry Skip Adder is faster than Ripple Carry Adder at the expense of a few relatively simple modifications. The delay of the Carry Skip Adder is still linearly dependent on the size of the adder N, however this linear dependence is reduced by a factor of 1/k. Oklobdzija 2004 Computer Arithmetic

Carry-Skip Adder k Oklobdzija 2004 Computer Arithmetic

Variable Block Adder (Oklobdzija, Barnes: IBM 1985)
Computer Arithmetic

Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985)
The idea behind Variable Block Adder is to minimize the longest critical path in the carry chain of Carry Skip Adder, while allowing the groups to take different sizes. Such optimization in general does not result in an enhanced complexity as compared to the Carry Skip Adder. A carry-chain of a 32-bit Variable Block Adder is shown. The first and the last blocks are smaller, and the intermediate blocks are larger. That compensates for the critical paths originating from the ends by shortening the length of the path used for the carry signal to ripple in the end groups, allowing carry to skip over larger groups in the middle. There are two important consequences of this optimization: First, the total delay is reduced as compared to Carry Skip Adder Second, the delay dependency is not a linear function of the adder size N as in Carry Skip Adder. This dependency follows a square root function of N instead. It is also possible to extend this approach to multiple levels of carry skips which represents a linear programming problem, that does not yield a closed form solution. The speed of such a multiple-level adder surpasses that of fixed group Carry-Lookahead Adder. It also exhibits the lower area and power consumption while retaining its speed. Variable Block Adder has the lowest energy-delay product as compared to the other adders in its class. Oklobdzija 2004 Computer Arithmetic

Any-point-to-any-point delay = 9 D as compared to 12 D for CSKA
Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) 6 5 5 4 4 3 D=9 3 1 1 Any-point-to-any-point delay = 9 D as compared to 12 D for CSKA The idea behind Variable Block Adder is to minimize the longest critical path in the carry chain of Carry Skip Adder, while allowing the groups to take different sizes. Such optimization in general does not result in an enhanced complexity as compared to the Carry Skip Adder. A carry-chain of a 32-bit Variable Block Adder is shown. The first and the last blocks are smaller, and the intermediate blocks are larger. That compensates for the critical paths originating from the ends by shortening the length of the path used for the carry signal to ripple in the end groups, allowing carry to skip over larger groups in the middle. There are two important consequences of this optimization: First, the total delay is reduced as compared to Carry Skip Adder Second, the delay dependency is not a linear function of the adder size N as in Carry Skip Adder. This dependency follows a square root function of N instead. It is also possible to extend this approach to multiple levels of carry skips which represents a linear programming problem, that does not yield a closed form solution. The speed of such a multiple-level adder surpasses that of fixed group Carry-Lookahead Adder. It also exhibits the lower area and power consumption while retaining its speed. Variable Block Adder has the lowest energy-delay product as compared to the other adders in its class. Oklobdzija 2004 Computer Arithmetic

Carry-chain block size determination for a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985) Oklobdzija 2004 Computer Arithmetic

Delay Calculation for Variable Block Adder (Oklobdzija, Barnes: IBM 1985)
Delay model: Oklobdzija 2004 Computer Arithmetic

Variable Block Adder (Oklobdzija, Barnes: IBM 1985)
Variable Group Length Oklobdzija, Barnes, Arith’85 Oklobdzija 2004 Computer Arithmetic

Carry-chain of a 32-bit Variable Block Adder (Oklobdzija, Barnes: IBM 1985)
Variable Block Lengths No closed form solution for delay It is a dynamic programming problem Oklobdzija 2004 Computer Arithmetic

Delay Comparison: Variable Block Adder (Oklobdzija, Barnes: IBM 1985)
Computer Arithmetic

Delay Comparison: Variable Block Adder
VBA CLA VBA- Multi-Level Oklobdzija 2004 Computer Arithmetic

VLSI Arithmetic Adders & Multipliers

Similar presentations

Presentation on theme: "VLSI Arithmetic Adders & Multipliers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VLSI Arithmetic Adders & Multipliers

Similar presentations

Presentation on theme: "VLSI Arithmetic Adders & Multipliers"— Presentation transcript:

Similar presentations

About project

Feedback