Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..

Slides:



Advertisements
Similar presentations
CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:
Advertisements

Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
EE141 Adder Circuits S. Sundar Kumar Iyer.
Digital Integrated Circuits A Design Perspective
1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.
Asynchronous comparator design
1 EFFICIENT ADDERS TO SPEEDUP MODULAR MULTIPLICATION FOR CRYPTOGRAPHY Adnan Gutub Hassan Tahhan Computer Engineering Department KFUPM, Dhahran, SAUDI ARABIA.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
Parallel Adder Recap To add two n-bit numbers together, n full-adders should be cascaded. Each full-adder represents a column in the long addition. The.
EECS Components and Design Techniques for Digital Systems Lec 17 – Addition, Subtraction, and Negative Numbers David Culler Electrical Engineering.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California
SCOTT MILLER, AMBROSE CHU, MIHAI SIMA, MICHAEL MCGUIRE ReCoEng Lab DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING UNIVERSITY OF.
IMPLEMENTATION OF µ - PROCESSOR DATA PATH
ECE C03 Lecture 61 Lecture 6 Arithmetic Logic Circuits Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Introduction to CMOS VLSI Design Lecture 11: Adders
Introduction to VLSI Circuits and Systems, NCUT 2007 Chapter 12 Arithmetic Circuits in CMOS VLSI Introduction to VLSI Circuits and Systems 積體電路概論 賴秉樑 Dept.
Digital Design – Optimizations and Tradeoffs
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Shifters. n Adders and ALUs.
Lecture 8 Arithmetic Logic Circuits
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Fall 2008EE VLSI Design I - © Kia Bazargan 1 EE 5323 – VLSI Design I Kia Bazargan University of Minnesota Adders.
Lecture 12b: Adders. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 17: Adders 2 Generate / Propagate  Equations often factored into G and P  Generate and.
M. Interleaving Montgomery High-Radix Comparison Improvement Adders CLA CSK Comparison Conclusion Improving Cryptographic Architectures by Adopting Efficient.
Copyright 2008 Koren ECE666/Koren Part.5a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
Lec 17 : ADDERS ece407/507.
Bar Ilan University, Engineering Faculty
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
Arithmetic Building Blocks
5-1 Programmable and Steering Logic Chapter # 5: Arithmetic Circuits.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Reference: Digital Integrated.
Arithmetic Building Blocks
1/8/ L3 Data Path DesignCopyright Joanne DeGroat, ECE, OSU1 ALUs and Data Paths Subtitle: How to design the data path of a processor.
Chapter 14 Arithmetic Circuits (I): Adder Designs Rev /12/2003
Design of a 32-Bit Hybrid Prefix-Carry Look-Ahead Adder
Basic Addition Review Basic Adders and the Carry Problem
Nov 10, 2008ECE 561 Lecture 151 Adders. Nov 10, 2008ECE 561 Lecture 152 Adders Basic Ripple Adders Faster Adders Sequential Adders.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics n Shifters. n Adders and ALUs.
درس مدارهای منطقی دانشگاه قم مدارهای منطقی محاسباتی تهیه شده توسط حسین امیرخانی مبتنی بر اسلایدهای درس مدارهای منطقی دانشگاه.
A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
1 Lecture 12 Time/space trade offs Adders. 2 Time vs. speed: Linear chain 8-input OR function with 2-input gates Gates: 7 Max delay: 7.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
1 Arithmetic I Instructor: Mozafar Bag-Mohammadi Ilam University.
COMP541 Arithmetic Circuits
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
COMP541 Arithmetic Circuits
C-H1 Lecture Adders Half adder. C-H2 Full Adder si is the modulo- 2 sum of ci, xi, yi.
Adding the Superset Adder to the DesignWare IP Library
Addition, Subtraction, Logic Operations and ALU Design
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
ECE/CS 552: Arithmetic I Instructor:Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes partially based on set created by Mark Hill.
Lecture #23: Arithmetic Circuits-1 Arithmetic Circuits (Part I) Randy H. Katz University of California, Berkeley Fall 2005.
Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Spezielle.
UNIT 2. ADDITION & SUBTRACTION OF SIGNED NUMBERS.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
Basic Addition Review Basic Adders and the Carry Problem Carry Propagation Speedup Speed/Cost Tradeoffs Two-operand Versus Multi-operand Adders.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
Somet things you should know about digital arithmetic:
Arithmetic Circuits (Part I) Randy H
EFFICIENT ADDERS TO SPEEDUP MODULAR MULTIPLICATION FOR CRYPTOGRAPHY
ARM implementation the design is divided into a data path section that is described in register transfer level (RTL) notation control section that is viewed.
Arithmetic Building Blocks
Presentation transcript:

Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..

Asynchronous Adder Design Motivation Background: Sync and Async adders Delay-insensitive carry-lookahead adders Complexity Analysis Conclusions

Motivation Integer addition is one of the most important operations in digital computer systems Statistics shows that in a prototypical RISC machine (DLX) 72% of the instructions perform additions(or subtractions) in the datapath. In ARM processors it even reaches 80%. The performance of processors is significantly influenced by the speed of their adders.

Background Adders: synchronous or asynchronous synchronous adders: worst case performance asynchronous adders: average case performance For example: Ripple-Carry Adders(synchronous): O(n) Carry-Completion Sensing Adders(asynchronous): O(log n)

Background: Binary Addition Worst case S C Adders can perform average case behavior Best case S C

Background Ripple-Carry Adders: One-stage full adder: Logic complexity: O(n) Time complexity: O(n)

Background Carry-Sensing Completion Detection Adders: (asynchronous version of RCA)

Background One-stage CSCD Adder: Carry-Sensing Completion Detection Adders: Logic complexity: O(n) Time complexity: O(log n)

Background Delay-Insensitive Ripple-Carry Adders: (DI version of RCA):

Background One-stage DIRCA: DIRCA Adders: Logic complexity: O(n) Time complexity: O(log n) One of the most robust adders

Background Completion detection for asynchronous adders:

Background DI adder VS Bundling Constraint adder:

Carry-Lookahead Adders RCA requires n stage-propagation delays. For high speed processors, this scheme is undesirable. One way to improve adder performance is to use parallel processing in computing the carries. That is why Carry-Lookahead Adders (CLA) are introduced. CLAs: Logic complexity: O(n) Time complexity: O(log n)

Carry-Lookahead Adders

A module: B module:

DI Carry-Lookahead Adders Delay-Insensitive Carry-Lookahead Adders (DICLA) may be implemented by using delay-insensitive code. 1. dual-rail signaling: inputs, sums, and carry bits 2. one-hot code: internal signals A1=0 A0=0 A1=0 A0=1 A1=1 A0=0 A1=1 A0=1 a. No data b. valid 0 c. valid 1 d. illegal a. No data: 000 b. 001 c. 010 d. 100

QDI Carry-Lookahead Adders DI C module: 1. internal signals: one-hot code, k, g, p 2. input and sum bits: dual-rail signals CLA A module

QDI Carry-Lookahead Adders DI D module: 1. Internal signals: one-hot code, K, G, P 2. Carry bits: dual-rail signals CLA B module

DI Carry-Lookahead Adders

If A 3 =B 3 then C 3 is carry kill or generate k 3,g 3

DI Carry-Lookahead Adders G 3,2, K 3,2 can be used to speed up the carry computation too. k 3,g 3 K 3,2, G 3,2

Speeding Up DICLA Idea: Send the carry-generate’s and carry-kill’s to any possible stages which needs these information to compute carries immediately. D module with speed-up circuitry

Speeding Up DICLA General form: D module with speed-up circuitry for carry-kill for carry-generate = g j-1 +g j-2 P j-1 +…+g 0 p 1 p 2 …p j-1 This is in fact the full carry-lookahead scheme.

Speeding Up DICLA Problem of full carry-lookahead scheme practical limitations on fan-in and fan-out, irregular structure, and many long wire. logic complexity increases more than linearly Solution: use the properties of tree-like structure New speed-up circuitry:

SP focuses on the root node of a subtree. All leftmost root node of its right subtree

Power of Speed-up Circuitry x : carry chain x’ in r subtree x-x’ in l subtree

Power of Speed-up Circuitry Without Speed-up circuitry

Power of Speed-up Circuitry With Speed-up circuitry

Optimization: Simplified D module Simplified D’ module Better logic complexity Delay-Insensitive again

Complexity Analysis DICLASP Logic Complexity:  (n) Time Complexity:  (log log n) Best area-time efficiency:  (n log log n)

Complexity Analysis

CMOS: C module

CMOS: SD module

CMOS: SD’ module

SPICE Simulation: SPICE Simulation contains two parts: Random number inputs: random generated input pairs Statistical data: running examples on a 32-bit ARM emulator

SPICE Simulation: Random number input distribution

SPICE Simulation: SPICE simulation results: random number inputs Speedup: DIRCA vs RCA: 6.39 DICLASP vs CLA: 2.64

SPICE Simulation: Breakdown of addition/subtraction operations: by runing three benchmark programs: Dhrystone f1, Dhrystone f2 and Espresso dc2 on a 32-bit ARM simulator

SPICE Simulation :dynamic traces

SPICE Simulation: dynamic traces 83.92% instructions: |carry chain| <17

SPICE Simulation: SPICE simulation results: dynamic traces Average computation time: DIRCA 9.61ns DICALSP 5.25ns Speedup: DIRCA vs RCA: 4.1 DICLASP vs CLA: 2.2

Conclusion DICLASP Best area-time efficiency:  (n log log n) Correctness: No adder is more robust than DICLASP Cost(Logic Complexity):No parallel adder is cheaper than DICLASP (  (n)). Speed(Time Complexity):No adder is better than DICLASP (  (log log n)). Suitable for VLSI implementation.