Download presentation
1
Lecture 17: Adders
2
Outline Datapath Computer Arithmetic Principles Single-bit Addition
Carry-Ripple Adder Carry-Skip Adder Carry-Lookahead Adder Carry-Select Adder Carry-Increment Adder Tree Adder 17: Adders
3
A Generic Digital Processor
17: Adders
4
Building Blocks for Digital Architectures
Arithmetic unit Bit sliced data path – adder, multiplier, shifter, comparator, etc. Memory RAM, ROM, buffers, shift registers Control Finite state machine (PLA, random logic) Counters Interconnect Switches, arbiters, bus 17: Adders
5
An Intel Microprocessor
17: Adders
6
Bit-Sliced Design 17: Adders
7
Bit-Sliced Datapath 17: Adders
8
Itanium Integer Datapath
17: Adders
9
Motivation Arithmetic units are, among others, core of every data path and addressing unit. Data path is at the core of microprocessors (CPU) signal processors (DSP) data processing application specific IC’s (ASIC) and programmable IC’s (FPGA) Standard arithmetic units available from libraries Design of arithmetic units necessary for non-standard operations high performance components library development 17: Adders
10
Naming Conventions Signal busses: A (1-D), Ai, (2-D), ai:k (sub-bus, 1-D) Signals: a, ai (1-D), ai,k (2-D), Ai:k (group signal) Circuit complexity measures: A (Area), T (cycle time, delay), AT (area-time product), L (latency, number of cycles). Arithmetic operators: +, -, •, /, log (=log2) Logic operators: OR, AND, XOR, NOT, … 17: Adders
11
Circuit Complexity Measures
Unit gate model Inverter, buffer: A = 0, T = 0 Simple monotonic 2-input gates (AND, OR, NAND, NOR): A = 1, T = 1 Simple non-monotonic 2-input gates (XOR, XNOR): A = 2, T = 2 Simple m-input gates: A = m – 1, T = Wiring not considered Only for estimation purposes 17: Adders
12
Recursive Function Evaluation
Given: inputs ai, outputs zi, function f (graph sym. •) Non-recursive functions (n.) Output zi is a function of input ai Parallel structure 17: Adders
13
Recursive Function Evaluation
Recursive functions (r.) Output zi is a function of all inputs ak, k ≤ i with a single output z = zn-1 (r.s.): f is non-associative (r.s.n) serial structure f is associative (r.s.a) serial or single-tree structure 17: Adders
14
Recursive Function Evaluation
Output zi is a function of all inputs ak, k ≤ i multiple outputs zi (r.m.) (=> prefix problem) f is non-associative (r.m.n) serial structure f is associative (r.m.a) Serial or multi-tree structure Shared tree structure 17: Adders
15
Arithmetic Operations
Overview 17: Adders
16
Overview of Arithmetic Operations
Direct implementation of dedicated units always: 1 – 5 in most cases: 6 sometimes: 7, 8 Sequential implementation using simpler units and several clock cycles (decomposition) sometimes: 6 in most cases: 7, 8, 9 Table look-up techniques using ROMs universal: simple application to all operations efficient only for single-operand operations of high complexity (8 - 12) and small word length. 17: Adders
17
Overview of Arithmetic Operations
Approximation using simpler units: 7 – 12 Taylor series expansion polynomial and rational approximations convergence of recursive equation systems CORDIC (COordinate Rotation DIgital Computer) 17: Adders
18
Binary Number Systems Radix-2, binary number system (BNS): irredundant, weighted, positional, monotonic. n-bit number is an ordered sequence of bits (binary digits) Simple and efficient implementation in digital circuits MSB/LSB (most/least significant bit): an-1/a0 Represents an integer or fixed point number, exact. Fixed point numbers: m-bit integer n-m bit fraction 17: Adders
19
Binary Number Systems Unsigned: positive or natural numbers Value:
Range: Two’s (2’s) complement: standard representation of signed or integer numbers Value Range 17: Adders
20
Binary Number Systems Complement: Sign: an-1
Properties: asymmetric range, compatible with unsigned numbers in many arithmetic operations. (same treatment of positive and negative numbers) One’s (1’s) complement: similar to 2’s complement Value: Range: 17: Adders
21
Binary Number Systems Complement: Sign: an-1
Properties: double representation of zero, symmetric range, modulo (2n-1) number system. Sign-magnitude: alternative representation of signed numbers Value: Range: 17: Adders
22
Binary Number Systems Sign: an-1
Properties: double representation of zero, symmetric range, different treatment of positive and negative numbers in arithmetic operations, no MSB toggles at sign changes around 0 (=> low power) 17: Adders
23
Gray Numbers Gray numbers (code): binary, irredundant, non-weighted, non-monotonic. Property: unit-distance coding. Exactly one-bit toggles between adjacent numbers. Applications: counters with low output toggle rate (low power busses), representation of continuous signals for low-error sampling (no false numbers due to switching of different bits at different times). Non-monotonic numbers: difficult arithmetic operations (addition, comparison). 17: Adders
24
Gray Numbers Binary - Gray conversion Gray – binary conversion
17: Adders
25
Redundant Number Systems
Non-binary, redundant, weighted number systems. Digit set larger than radix (typically radix 2) => multiple representations of the same number => redundancy. No carry propagation in adders => more efficient implementation of adder-based units (multipliers, dividers, etc.) Redundancy => no direct implementation of relational operators => conversion to irredundant numbers. Several bits used to represent one digit => higher storage requirements. Expensive conversion to irredundant numbers. Not necessary if redundant input operators are allowed. 17: Adders
26
Delayed-Carry Representation
Delayed-carry or half adder representation 1 digit holds the sum of 2 bits (no carry out) Example: = (0,0) (1,0) = 2 17: Adders
27
Carry-Save Representation
One digit holds the sum of 3 bits or 1 digit and 1 bit. No carry-out digit, carry is saved. Standard redundant number system for fast addition. 17: Adders
28
Signed-Digit Representation
Signed-digit (SD) or redundant digit (RD) number representation. No carry propagation in S = R + T One digit holds the sum of two digits. No carry-out. 17: Adders
29
Signed-Digit Representation
Minimal SD representation: minimal number of non-zero digits. Applications: sequential multiplication (less cycles), filters with constant coefficients (less hardware). Example: minimal 17: Adders
30
Signed-Digit Representation
Canonical SD representation: minimal SD. Not two non-zero digits in sequence. SD -> binary: carry propagation necessary => adder. Other applications: high speed multipliers. Similar to carry-save, simple use for signed numbers. 17: Adders
31
Residue Number Systems
Non-binary, irredundant, non-weighted number system. Carry-free and fast additions and multiplications. Complex and slow other arithmetic operations (e.g. comparison, sign, and overflow detection) because digits are not weighted. Conversion to weighted mixed-radix or binary system required. Codes for error correction and detection. Possible applications (but hardly used) Digital filters Error detection and correction 17: Adders
32
Residue Number Systems
Base is n-tuple of integers (mn-1, mn-2, …, m0), residues (or moduli). These mi are pairwise prime. Arithmetic operations: each digit computed separately. 17: Adders
33
Residue Number Systems
Best moduli mi are 2k and 2k – 1. High storage efficiency with k bits. Simple modular addition k bit adder without cout 17: Adders
34
Residue Number Systems
Example: 17: Adders
35
Floating-Point Numbers
Larger range, smaller precision than fixed-point representation, inexact, real numbers. Double-number form => discontinuous precision. S | biased exponent E | unsigned norm mantissa M Basic arithmetic operations 17: Adders
36
Floating-Point Numbers
Basic arithmetic operations based in fixed point add, multiply, and shift operations. Post-normalization required. Applications: Processors: real floating point formats (e.g. IEEE standard), large range due to universal use. ASICs: usually simplified floating-point formats with small exponents, smaller range. Used for range extension of normal fixed-point numbers. IEEE floating point format: 17: Adders
37
Logarithmic Number System
Alternative representation to floating point (mantissa + integer exponent -> only fixed point exponent). Single number form => continuous precision => higher accuracy, more reliable. Basic arithmetic operations: (A < B) = (EA < EB) additionally consider sign A + B by approximation or addition in conventional number system and double conversion. 17: Adders
38
Logarithmic Number System
Basic arithmetic operations Simpler multiplication, exponentiation. More complex addition. Expensive conversion: (anti)logarithms probably by table look-up. Applications: real-time digital filters. 17: Adders
39
Antitetrational Number System
Tetration (t.x = and antitetration (a.t.x) Larger range, but smaller precision than logarithmic representation. Otherwise, analogous. Note that all these systems can be mixed in composite arithmetic. Choice of number representation should be hidden from the user. The compiler should handle it. Rational numbers can also be represented in floating slash notation. 17: Adders
40
Round-Off Schemes Intermediate results with d additional lower bits. This results in higher accuracy. Rounding: keeping error e small during final word length reduction: Trade-off: numerical accuracy vs implementation cost. Truncation = average error e Round to nearest (normal rounding) 17: Adders
41
Round-Off Schemes Round to nearest The error is nearly symmetric
can often be included in a previous operation. Round to nearest even/odd bias = 0 (symmetric) Mandatory in IEEE floating-point standard 3 guard bits for rounding after floating point operations: guard bit G (postnormalization), round bit R (round to nearest ), sticky bit S (round to nearest even) 17: Adders
42
Addition 17: Adders
43
Single-Bit Addition Half Adder Full Adder A B Cout S 1 A B C Cout S 1
1 A B C Cout S 1 17: Adders
44
1-Bit Adders Add up m bits of same magnitude
Output the sum as a k-bit number ( ) Or count 1’s at inputs => (m,k) counter – combinational counter. A half adder is a (2,2) counter 17: Adders
45
1-Bit Adders 17: Adders
46
1-Bit Adders A full-adder is a (3,2) counter. 17: Adders
47
PGK For a full adder, define what happens to carries
(in terms of A and B) Generate: Cout = 1 independent of C G = A • B Propagate: Cout = C P = A B Kill: Cout = 0 independent of C K = ~A • ~B 17: Adders
48
Full Adder Design I Brute force implementation from eqns 17: Adders
49
Full Adder Design II Factor S in terms of Cout
S = ABC + (A + B + C)(~Cout) Critical path is usually C to Cout in ripple adder 17: Adders
50
Full Adder Design II Same circuit with sized transistors 17: Adders
51
Layout Clever layout circumvents usual line of diffusion
Use wide transistors on critical path Eliminate output inverters 17: Adders
52
Full Adder Design III Complementary Pass Transistor Logic (CPL)
Slightly faster, but more area 17: Adders
53
Full Adder Design III Transmission gates 17: Adders
54
Full Adder Design IV Dual-rail domino
Very fast, but large and power hungry Used in very fast multipliers 17: Adders
55
(m,k) Counters Usually built from full-adders.
Associativity of addition allows conversion from linear to tree structure => faster at the same number of FAs. 17: Adders
56
(7,3) Counter Example 17: Adders
57
Carry Propagate Adders
Add two n-bit operands A and B and an optional carry in cin by performing carry propagation. Sum (cout, S) is an irredundant (n+1) bit number 17: Adders
58
Carry Propagate Adders
N-bit adder called CPA Each sum bit depends on all previous carries How do we compute all these carries quickly? 17: Adders
59
Ripple-Carry Adder(RCA)
Serial arrangement of n full adders. Simplest, smallest, and slowest CPA structure. 17: Adders
60
Carry-Ripple Adder Simplest design: cascade full adders
Critical path goes from Cin to Cout Design full adder to have fast carry delay 17: Adders
61
Carry Ripple Adder Note that worst case delay is linear with number of bits. Goal: Make the fastest possible carry path circuit. 17: Adders
62
A Full Adder Circuit 17: Adders
63
Inversion Property 17: Adders
64
Inversions Critical path passes through majority gate
Built from minority + inverter Eliminate inverter and use inverting full adder 17: Adders
65
Mirror Adder 17: Adders
66
Mirror Adder 17: Adders
67
Mirror Adder The NMOS and PMOS chains are completely symmetrical. A maximum of two series transistors can be observed in the carry generation circuit. When laying out the cell, the most critical issue is the minimization of the capacitance at node Co. The reduction of the diffusion capacitances is particularly important. The capacitance at node Co is composed of four diffusion capacitances, two internal gate capacitances, and six gate capacitances in the connecting adder cell. 17: Adders
68
Mirror Adder The transistors connected to Ci are placed closest to the input. Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size. 17: Adders
69
Transmission Gate FA 17: Adders
70
Carry Propagation Speed-up
Concatenation of partial CPA’s with fast cin -> cout. Fast carry look-ahead logic for entire range of bits. 17: Adders
71
Generate / Propagate Equations often factored into G and P
Generate and propagate for groups spanning i:j Base case Sum: 17: Adders
72
PG Logic 17: Adders
73
PG Logic 17: Adders
74
Carry-Ripple Revisited
17: Adders
75
Carry-Ripple PG Diagram
17: Adders
76
PG Diagram Notation 17: Adders
77
Manchester Carry Chain
17: Adders
78
Manchester Carry Chain
17: Adders
79
Manchester Carry Chain
17: Adders
80
Carry-Skip Adder Carry-ripple is slow through all N stages
Carry-skip allows carry to skip over groups of n bits Decision based on n-bit propagate signal 17: Adders
81
Carry-Skip Adder 17: Adders
82
Carry-Skip Adder 17: Adders
83
Carry-Skip Adder 17: Adders
84
Carry-Skip PG Diagram For k n-bit groups (N = nk) 17: Adders
85
Variable Group Size Delay grows as O(sqrt(N)) 17: Adders
86
Carry-Skip Adder Partial CPA with fast ck -> ci
If Pi-1:k = 0 : ck does not become c’i and c’i is selected, becoming ci. If Pi-1:k = 0 : ck becomes c’i, but c’i is skipped. Path ck -> c’i -> ci never sensitized => fast ck -> ci False path => inherent logic redundancy => problems in circuit optimization, timing analysis, and testing. 17: Adders
87
Carry-Skip Adder Variable group sizes are faster.
Use larger groups in the middle Minimize delays a0 -> ck -> si-1 and ak -> ci -> sn-1 Partial CPA type is RCA or CSKA (multilevel CSKA) Medium speed-up at small hardware overhead (+ AND/bit +MUX/group) 17: Adders
88
CSKA + Manchester 17: Adders
89
Carry-Select Adder Trick for critical paths dependent on late input X
Precompute two possible outputs for X = 0, 1 Select proper output when X arrives Carry-select adder precomputes n-bit sums For both possible carries into n-bit group 17: Adders
90
Carry-Select Adder Partial CPA with fast ck -> ci and ck -> si-1:k Two CPA’s compute two possible results (cin = 0/1), group carry-in ck selects correct one afterwards. Variable group sizes are faster; use larger groups at end (MSB). Balance delays a0 -> ck and ak -> ci0 Partial CPA type is RCA, CSLA (multilevel CSLA) or CLA. 17: Adders
91
Carry-Select Adder High speed-up at high hardware overhead.
+ MUX/bit + (CPA + MUX)/group 17: Adders
92
Carry-Select Adder 17: Adders
93
Carry-Select Adder 17: Adders
94
Linear Carry-Select 17: Adders
95
Square-Root Carry-Select
17: Adders
96
Delay Comparison 17: Adders
97
Carry-Increment Adder
Partial CPA with fast ck -> ci and ck -> si-1:k Result is incremented after addition if ck = 1 Variable group sizes are faster, use larger groups at end (MSB). Balance delays a0 -> ck and ak -> c’i Partial CPA could be RCA, CIA (multilevel CIA) or CLA. High speed-up at medium hardware overhead (+AND/bit + (incrementer + AND/OR)/group). Logic of CPA and incrementer could be merged. 17: Adders
98
Carry-Increment Adder
17: Adders
99
Carry-Increment Adder
Example: gate-level schematic of carry-increment adder (CIA) Only two different logic cells (bit-slices): IHA and IFA 17: Adders
100
Carry-Increment Adder
Factor initial PG and final XOR out of carry-select 17: Adders
101
Variable Group Size Also buffer noncritical signals 17: Adders
102
Conditional-Sum Adder
Optimized multilevel CSLA with logn levels Correct sum bits or are conditionally selected through logn levels of multiplexers. Bit groups of size 2l at level l. Higher parallelism, more balanced signal paths. Highest speed-up at highest hardware overhead (2RCA + more than logn MUX/bit) 17: Adders
103
Conditional-Sum Adder
17: Adders
104
Conditional-Sum Adder
17: Adders
105
Conditional-Sum Adder
17: Adders
106
Carry-Lookahead Adder
Carries look ahead before sum bits are computed Hierarchical arrangement using levels: passed up, c’0 passed down between levels. High speed-up at medium hardware overhead. 17: Adders
107
Carry-Lookahead Adder
17: Adders
108
Carry-Lookahead Adder
17: Adders
109
Carry-Lookahead Adder
Carry-lookahead adder computes Gi:0 for many bits in parallel. Uses higher-valency cells with more than two inputs. 17: Adders
110
CLA PG Diagram 17: Adders
111
Carry-Lookahead 17: Adders
112
Lookahead Tree 17: Adders
113
Lookahead Tree 17: Adders
114
Higher-Valency Cells 17: Adders
115
Higher Valency PG Diagram
17: Adders
116
Tree Adder If lookahead is good, lookahead across lookahead!
Recursive lookahead gives O(log N) delay Many variations on tree adders 17: Adders
117
Parallel Prefix Adders
Universal adder architecture comprising RCA, CIA, CLA, and more (entire range of area-delay trade-offs from slowest RCA to fastest CLA). Preprocessing, carry-lookahead, and postprocessing step. Carries calculated using parallel-prefix algorithms High regularity: suitable for synthesis and layout High flexibility: special adders, other arthmetic operations, exchangeable prefix algorithms. High performance: smallest and fastest adders 17: Adders
118
Parallel Prefix Adders
119
Prefix Problem Inputs (xn-1,…,x0) outputs (yn-1,…,y0), associative binary operator • Associativity of • => tree structures for evaluation 17: Adders
120
Prefix Problem Group variables : covers bits (xk,…,xi) at level l.
Carry-propagation is prefix problem: Parallel-prefix algorithms: Multi-tree structures T = O(n) -> O(logn) Sharing subtrees A = O(n2) -> O(nlogn) Different algorithms trading area vs delay. Also consider wirng and fanout. 17: Adders
121
Prefix Algorithms Algorithms visualized by directed acyclic graphs (DAG) with array structure (n bits x m levels). Graph vertex symbols Performance measures: A• : graph size (number of black nodes) T• : graph depth (number of black nodes on critical path) 17: Adders
122
Prefix Algorithms Serial prefix algorithm (RCA) 17: Adders
123
Prefix Algorithms Sklansky parallel-prefix algorithm (PPA-SK)
Tree-like collection, parallel redistribution of carries 17: Adders
124
Sklansky 17: Adders
125
Prefix Algorithms Brent-Kung parallel-prefix algorithm (PPA-BK)
Traditional CLA is PPA-BK with 4-bit groups Tree-like redistribution of carries (fan-out tree) 17: Adders
126
Brent-Kung 17: Adders
127
Prefix Algorithms Kogge-Stone parallel-prefix algorithm (PPA-KS)
very high wiring requirements 17: Adders
128
Kogge-Stone 17: Adders
129
Prefix Algorithms Carry-increment parallel-prefix algorithm 17: Adders
130
Prefix Algorithms Mixed serial/parallel-prefix algorithm (RCA+PPA)
Linear size-depth trade-off using parameter k: k = 0 : serial prefix graph : Brent-Kung parallel-prefix graph Fills the gap between RCA and PPA-BK (CLA) in steps of single •-operations. 17: Adders
131
Prefix Algorithms 17: Adders
132
Prefix Algorithms Example: 4-bit PPA-SK
Efficient AND-OR-prefix circuit for the generate and AND-prefix circuit for the propagate signals Optimization: alternatingly AOI/OAI- resp. NAND-/NOR-gates (inverting gatesare smaller and faster). Can also be realized using two MUX-prefix circuits 17: Adders
133
Prefix Algorithms 17: Adders
134
Prefix Algorithms Prefix adders can be synthesized by human or computer as well. Starting from a serial structure, one can use compression rules and expansion rules to obtain new graphs. Can generate all previous graphs except PPA-KS. Universal adder synthesis approach. 17: Adders
135
Tree Adder Taxonomy Ideal N-bit tree adder would have
L = log N logic levels Fanout never exceeding 2 No more than one wiring track between levels Describe adder with 3-D taxonomy (l, f, t) Logic levels: L + l Fanout: 2f + 1 Wiring tracks: 2t Known tree adders sit on plane defined by l + f + t = L-1 17: Adders
136
Tree Adder Taxonomy 17: Adders
137
Han-Carlson 17: Adders
138
Knowles [2, 1, 1, 1] 17: Adders
139
Ladner-Fischer 17: Adders
140
Taxonomy Revisited 17: Adders
141
More Adder Issues Multilevel adders
Multilevel versions of adders possible CSKA, CSLA, CIA Hybrid adders Arbitrary combination of speed-up techniques possible. Often used combinations: CLA – CSLA Transistor level adders Influence of logic styles (dynamic logic, pass transistor logic) Efficient transistor level implementation of ripple-carry chains (Manchester chain) Combinations of speed-up techniques make sense. Much higher design effort Many efficient implementations exist in the literature. Higher valency (radix) also possible. 17: Adders
142
More Adder Issues Higher valency is a poor choice in static CMOS logic since each stage has higher delay. However, if the stages are built using domino logic, it could prove to be an advantage. Nodes with large fanouts or long wires could use buffers. The prefix trees can also be internally pipelined. 17: Adders
143
Transistor Level 17: Adders
144
Transistor Level 17: Adders
145
Transistor Level 17: Adders
146
Higher Valency Adders 17: Adders
147
Sparse Trees Building a prefix tree to compute carries in every bit is expensive in terms of power. An alternative is to compute carries into short groups such as s = 2,3,8, or 16 bits. Meanwhile, pairs of s-bit adders precompute the sums assuming both carries-in of 0 and 1 to each group. It is a hybrid between a prefix adder and carry select adder. 17: Adders
148
Valency-3 BK Adder Sparse tree adder with s = 3 17: Adders
149
Carry-Select Implementation
17: Adders
150
Sparse Tree Adders Intel Valency-2 Sklansky sparse tree adder with s=4
151
Sparse Tree Adders Valency-3 Kogge-Stone sparse tree adder with s=3
152
Ling Adders Ling discovered a technique to remove one series transistor from the critical group generate path at the expense of another XOR gate in the sum precomputation. Define a pseudo-generate Hi:j = Gi + Gi-1:j This is a simpler computation. Define a pseudo-propagate signal I that is a shifted version of propagate. 17: Adders
153
Ling Adders Finally, the sums are computed by 17: Adders
154
Ling Adders 17: Adders
155
Comparison Standard-cell implementation, 0.8mm technology 17: Adders
156
Comparison 17: Adders
157
Summary Adder architectures offer area / power / delay tradeoffs.
Choose the best one for your application. Architecture Classification Logic Levels Max Fanout Tracks Cells Carry-Ripple N-1 1 N Carry-Skip n=4 N/4 + 5 2 1.25N Carry-Inc. n=4 N/4 + 2 4 2N Brent-Kung (L-1, 0, 0) 2log2N – 1 Sklansky (0, L-1, 0) log2N N/2 + 1 0.5 Nlog2N Kogge-Stone (0, 0, L-1) N/2 Nlog2N 17: Adders
158
E vs Delay Trade-off 17: Adders
159
E vs Delay Tradeoff 90nm 64 bit domino KS Ling adder with various valency and s 17: Adders
160
Area vs Delay Synthesized Adders 17: Adders
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.