Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..

Asynchronous Adder Design Motivation Background: Sync and Async adders Delay-insensitive carry-lookahead adders Complexity Analysis Conclusions

Motivation Integer addition is one of the most important operations in digital computer systems Statistics shows that in a prototypical RISC machine (DLX) 72% of the instructions perform additions(or subtractions) in the datapath. In ARM processors it even reaches 80%. The performance of processors is significantly influenced by the speed of their adders.

Background Adders: synchronous or asynchronous synchronous adders: worst case performance asynchronous adders: average case performance For example: Ripple-Carry Adders(synchronous): O(n) Carry-Completion Sensing Adders(asynchronous): O(log n)

Background: Binary Addition Worst case 00000001 + 11111111 ---------------------- S 00000000 C 11111111 ---------------------- 100000000 Adders can perform average case behavior Best case 00000000 + 00000000 ---------------------- S 00000000 C 00000000 ---------------------- 000000000

Background Ripple-Carry Adders: One-stage full adder: Logic complexity: O(n) Time complexity: O(n)

Background Carry-Sensing Completion Detection Adders: (asynchronous version of RCA)

Background One-stage CSCD Adder: Carry-Sensing Completion Detection Adders: Logic complexity: O(n) Time complexity: O(log n)

Background Delay-Insensitive Ripple-Carry Adders: (DI version of RCA):

Background One-stage DIRCA: DIRCA Adders: Logic complexity: O(n) Time complexity: O(log n) One of the most robust adders

Background Completion detection for asynchronous adders:

Background DI adder VS Bundling Constraint adder:

Carry-Lookahead Adders RCA requires n stage-propagation delays. For high speed processors, this scheme is undesirable. One way to improve adder performance is to use parallel processing in computing the carries. That is why Carry-Lookahead Adders (CLA) are introduced. CLAs: Logic complexity: O(n) Time complexity: O(log n)

Carry-Lookahead Adders

A module: B module:

DI Carry-Lookahead Adders Delay-Insensitive Carry-Lookahead Adders (DICLA) may be implemented by using delay-insensitive code. 1. dual-rail signaling: inputs, sums, and carry bits 2. one-hot code: internal signals A1=0 A0=0 A1=0 A0=1 A1=1 A0=0 A1=1 A0=1 a. No data b. valid 0 c. valid 1 d. illegal a. No data: 000 b. 001 c. 010 d. 100

QDI Carry-Lookahead Adders DI C module: 1. internal signals: one-hot code, k, g, p 2. input and sum bits: dual-rail signals CLA A module

QDI Carry-Lookahead Adders DI D module: 1. Internal signals: one-hot code, K, G, P 2. Carry bits: dual-rail signals CLA B module

DI Carry-Lookahead Adders

If A 3 =B 3 then C 3 is carry kill or generate k 3,g 3

DI Carry-Lookahead Adders G 3,2, K 3,2 can be used to speed up the carry computation too. k 3,g 3 K 3,2, G 3,2

Speeding Up DICLA Idea: Send the carry-generate’s and carry-kill’s to any possible stages which needs these information to compute carries immediately. D module with speed-up circuitry

Speeding Up DICLA General form: D module with speed-up circuitry for carry-kill for carry-generate = g j-1 +g j-2 P j-1 +…+g 0 p 1 p 2 …p j-1 This is in fact the full carry-lookahead scheme.

Speeding Up DICLA Problem of full carry-lookahead scheme practical limitations on fan-in and fan-out, irregular structure, and many long wire. logic complexity increases more than linearly Solution: use the properties of tree-like structure New speed-up circuitry:

SP focuses on the root node of a subtree. All leftmost root node of its right subtree

Power of Speed-up Circuitry x : carry chain x’ in r subtree x-x’ in l subtree

Power of Speed-up Circuitry Without Speed-up circuitry

Power of Speed-up Circuitry With Speed-up circuitry

Optimization: Simplified D module Simplified D’ module Better logic complexity Delay-Insensitive again

Complexity Analysis DICLASP Logic Complexity:  (n) Time Complexity:  (log log n) Best area-time efficiency:  (n log log n)

Complexity Analysis

CMOS: C module

CMOS: SD module

CMOS: SD’ module

SPICE Simulation: SPICE Simulation contains two parts: Random number inputs: 10000 random generated input pairs Statistical data: running examples on a 32-bit ARM emulator

SPICE Simulation: Random number input distribution

SPICE Simulation: SPICE simulation results: random number inputs Speedup: DIRCA vs RCA: 6.39 DICLASP vs CLA: 2.64

SPICE Simulation: Breakdown of addition/subtraction operations: by runing three benchmark programs: Dhrystone f1, Dhrystone f2 and Espresso dc2 on a 32-bit ARM simulator

SPICE Simulation :dynamic traces

SPICE Simulation: dynamic traces 83.92% instructions: |carry chain| <17

SPICE Simulation: SPICE simulation results: dynamic traces Average computation time: DIRCA 9.61ns DICALSP 5.25ns Speedup: DIRCA vs RCA: 4.1 DICLASP vs CLA: 2.2

Conclusion DICLASP Best area-time efficiency:  (n log log n) Correctness: No adder is more robust than DICLASP Cost(Logic Complexity):No parallel adder is cheaper than DICLASP (  (n)). Speed(Time Complexity):No adder is better than DICLASP (  (log log n)). Suitable for VLSI implementation.

Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..

Similar presentations

Presentation on theme: "Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline ….."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..

Similar presentations

Presentation on theme: "Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline ….."— Presentation transcript:

Similar presentations

About project

Feedback