Download presentation
Presentation is loading. Please wait.
Published byFrancis Howard Modified over 9 years ago
1
Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..
2
Asynchronous Adder Design Motivation Background: Sync and Async adders Delay-insensitive carry-lookahead adders Complexity Analysis Conclusions
3
Motivation Integer addition is one of the most important operations in digital computer systems Statistics shows that in a prototypical RISC machine (DLX) 72% of the instructions perform additions(or subtractions) in the datapath. In ARM processors it even reaches 80%. The performance of processors is significantly influenced by the speed of their adders.
4
Background Adders: synchronous or asynchronous synchronous adders: worst case performance asynchronous adders: average case performance For example: Ripple-Carry Adders(synchronous): O(n) Carry-Completion Sensing Adders(asynchronous): O(log n)
5
Background: Binary Addition Worst case 00000001 + 11111111 ---------------------- S 00000000 C 11111111 ---------------------- 100000000 Adders can perform average case behavior Best case 00000000 + 00000000 ---------------------- S 00000000 C 00000000 ---------------------- 000000000
6
Background Ripple-Carry Adders: One-stage full adder: Logic complexity: O(n) Time complexity: O(n)
7
Background Carry-Sensing Completion Detection Adders: (asynchronous version of RCA)
8
Background One-stage CSCD Adder: Carry-Sensing Completion Detection Adders: Logic complexity: O(n) Time complexity: O(log n)
9
Background Delay-Insensitive Ripple-Carry Adders: (DI version of RCA):
10
Background One-stage DIRCA: DIRCA Adders: Logic complexity: O(n) Time complexity: O(log n) One of the most robust adders
11
Background Completion detection for asynchronous adders:
12
Background DI adder VS Bundling Constraint adder:
13
Carry-Lookahead Adders RCA requires n stage-propagation delays. For high speed processors, this scheme is undesirable. One way to improve adder performance is to use parallel processing in computing the carries. That is why Carry-Lookahead Adders (CLA) are introduced. CLAs: Logic complexity: O(n) Time complexity: O(log n)
14
Carry-Lookahead Adders
15
A module: B module:
16
DI Carry-Lookahead Adders Delay-Insensitive Carry-Lookahead Adders (DICLA) may be implemented by using delay-insensitive code. 1. dual-rail signaling: inputs, sums, and carry bits 2. one-hot code: internal signals A1=0 A0=0 A1=0 A0=1 A1=1 A0=0 A1=1 A0=1 a. No data b. valid 0 c. valid 1 d. illegal a. No data: 000 b. 001 c. 010 d. 100
17
QDI Carry-Lookahead Adders DI C module: 1. internal signals: one-hot code, k, g, p 2. input and sum bits: dual-rail signals CLA A module
18
QDI Carry-Lookahead Adders DI D module: 1. Internal signals: one-hot code, K, G, P 2. Carry bits: dual-rail signals CLA B module
19
DI Carry-Lookahead Adders
20
If A 3 =B 3 then C 3 is carry kill or generate k 3,g 3
21
DI Carry-Lookahead Adders G 3,2, K 3,2 can be used to speed up the carry computation too. k 3,g 3 K 3,2, G 3,2
22
Speeding Up DICLA Idea: Send the carry-generate’s and carry-kill’s to any possible stages which needs these information to compute carries immediately. D module with speed-up circuitry
23
Speeding Up DICLA General form: D module with speed-up circuitry for carry-kill for carry-generate = g j-1 +g j-2 P j-1 +…+g 0 p 1 p 2 …p j-1 This is in fact the full carry-lookahead scheme.
24
Speeding Up DICLA Problem of full carry-lookahead scheme practical limitations on fan-in and fan-out, irregular structure, and many long wire. logic complexity increases more than linearly Solution: use the properties of tree-like structure New speed-up circuitry:
25
SP focuses on the root node of a subtree. All leftmost root node of its right subtree
26
Power of Speed-up Circuitry x : carry chain x’ in r subtree x-x’ in l subtree
27
Power of Speed-up Circuitry Without Speed-up circuitry
28
Power of Speed-up Circuitry With Speed-up circuitry
29
Optimization: Simplified D module Simplified D’ module Better logic complexity Delay-Insensitive again
31
Complexity Analysis DICLASP Logic Complexity: (n) Time Complexity: (log log n) Best area-time efficiency: (n log log n)
32
Complexity Analysis
33
CMOS: C module
34
CMOS: SD module
35
CMOS: SD’ module
36
SPICE Simulation: SPICE Simulation contains two parts: Random number inputs: 10000 random generated input pairs Statistical data: running examples on a 32-bit ARM emulator
37
SPICE Simulation: Random number input distribution
38
SPICE Simulation: SPICE simulation results: random number inputs Speedup: DIRCA vs RCA: 6.39 DICLASP vs CLA: 2.64
39
SPICE Simulation: Breakdown of addition/subtraction operations: by runing three benchmark programs: Dhrystone f1, Dhrystone f2 and Espresso dc2 on a 32-bit ARM simulator
40
SPICE Simulation :dynamic traces
41
SPICE Simulation: dynamic traces 83.92% instructions: |carry chain| <17
42
SPICE Simulation: SPICE simulation results: dynamic traces Average computation time: DIRCA 9.61ns DICALSP 5.25ns Speedup: DIRCA vs RCA: 4.1 DICLASP vs CLA: 2.2
43
Conclusion DICLASP Best area-time efficiency: (n log log n) Correctness: No adder is more robust than DICLASP Cost(Logic Complexity):No parallel adder is cheaper than DICLASP ( (n)). Speed(Time Complexity):No adder is better than DICLASP ( (log log n)). Suitable for VLSI implementation.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.