Clockless Computing Lecture 3

Slides:

Advertisements

Similar presentations

Self-Timed Logic Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical and.

Advertisements

Andrey Mokhov, Victor Khomenko Danil Sokolov, Alex Yakovlev Dual-Rail Control Logic for Enhanced Circuit Robustness.

Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.

Introduction to CMOS VLSI Design Sequential Circuits.

VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.

Introduction to CMOS VLSI Design Sequential Circuits

MICROELETTRONICA Sequential circuits Lection 7.

Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.

Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 4, 2011 Synchronous Circuits.

1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.

Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.

Clock Design Adopted from David Harris of Harvey Mudd College.

RTL Hardware Design by P. Chu Chapter 161 Clock and Synchronization.

© Ran Ginosar Lecture 3: Handshake Ckt Implementations 1 VLSI Architectures Lecture 3 S&F Ch. 5: Handshake Ckt Implementations.

1 Clockless Logic Montek Singh Thu, Jan 13, 2004.

1 Clockless Logic Montek Singh Tue, Mar 23, 2004.

1 Clockless Logic Montek Singh Tue, Mar 16, 2004.

ELEC 6200, Fall 07, Oct 24 Jiang: Async. Processor 1 Asynchronous Processor Design for ELEC 6200 by Wei Jiang.

COMP Clockless Logic and Silicon Compilers Lecture 3

1 Clockless Logic Montek Singh Tue, Mar 21, 2006.

High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths Montek Singh and Steven Nowick Columbia University New York, USA

Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.

1 Clockless Computing Montek Singh Thu, Sep 13, 2007.

Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.

1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.

1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.

Pipelining By Toan Nguyen.

1 Seminar on High-Speed Asynchronous Pipelines Montek Singh Thursdays 10-11, SN325.

Amitava Mitra Intel Corp., Bangalore, India William F. McLaughlin

MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,

Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.

DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.

1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.

Reading Assignment: Rabaey: Chapter 9

FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.

CS151 Introduction to Digital Design Chapter 5: Sequential Circuits 5-1 : Sequential Circuit Definition 5-2: Latches 1Created by: Ms.Amany AlSaleh.

EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining.

Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.

1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.

RTL Hardware Design by P. Chu Chapter 9 – ECE420 (CSUN) Mirzaei 1 Sequential Circuit Design: Practice Shahnam Mirzaei, PhD Spring 2016 California State.

1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,

Gopakumar.G Hardware Design Group

End OF Column Circuits – Design Review

Lecture 11: Sequential Circuit Design

Lecture 18: Pipelining I.

Other Approaches.

Advanced Digital Design

Sequential circuit design with metastability

VLSI Testing Lecture 5: Logic Simulation

Pipelining and Retiming 1

Recap: Lecture 1 What is asynchronous design? Why do we want to study it? What is pipelining? How can it be used to design really fast hardware?

Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits

Lecture 12: Adders, Sequential Circuits

Lecture 12: Adders, Sequential Circuits

触发器 Flip-Flops 刘鹏浙江大学信息与电子工程学院 March 27, 2018

Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.

Arithmetic Circuits (Part I) Randy H

Ka-Ming Keung Swamy D Ponpandi

Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003

ARM implementation the design is divided into a data path section that is described in register transfer level (RTL) notation control section that is viewed.

332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew

Day 21: October 29, 2010 Registers Dynamic Logic

Clockless Logic: Asynchronous Pipelines

VLSI Testing Lecture 7: Delay Test

Congestion Control (from Chapter 05)

Wagging Logic: Moore's Law will eventually fix it

A Quasi-Delay-Insensitive Method to Overcome Transistor Variation

Ka-Ming Keung Swamy D Ponpandi

Early output logic and Anti-Tokens

Lecture 3: Timing & Sequential Circuits

Presentation transcript:

Clockless Computing Lecture 3 Montek Singh Thu, Aug 30, 2007

Handshaking Example: Asynchronous Pipelines Pipelining basics Fine-grain pipelining Example Approach: MOUSETRAP pipelines

Background: Pipelining What is Pipelining?: Breaking up a complex operation on a stream of data into simpler sequential operations A “coarse-grain” pipeline (e.g. simple processor) A “fine-grain” pipeline (e.g. pipelined adder) fetch decode execute Storage elements (latches/registers) Performance Impact: + Throughput: significantly increased (#data items processed/second) – Latency: somewhat degraded (#seconds from input to output)

Focus of Asynchronous Community A Key Focus: Extremely fine-grain pipelines “gate-level” pipelining = use narrowest possible stages each stage consists of only a single level of logic gates some of the fastest existing digital pipelines to date Application areas: general-purpose microprocessors instruction pipelines: often 20-40 stages multimedia hardware (graphics accelerators, video DSP’s, …) naturally pipelined systems, throughput is critical; input “bursty” optical networking serializing/deserializing FIFO’s string matching? KMP style string matching: variable skip lengths

MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001 & IEEE Trans. VLSI June 2007

MOUSETRAP Pipelines Simple asynchronous implementation style, uses… standard logic implementation: Boolean gates, transparent latches simple control: 1 gate/pipeline stage MOUSETRAP uses a “capture protocol:” Latches … are normally transparent: before new data arrives become opaque: after data arrives (“capture” data) Control Signaling: transition-signaling = 2-phase simple protocol: req/ack = only 2 events per handshake (not 4) no “return-to-zero” each transition (up/down) signals a distinct operation Our Goal: very fast cycle time simple inter-stage communication

MOUSETRAP: A Basic FIFO Stages communicate using transition-signaling: Latch Controller 1 transition per data item! ackN-1 ackN En reqN doneN reqN+1 Data in Data out Data Latch Stage N-1 Stage N Stage N+1 2nd data item flowing through the pipeline 1st data item flowing through the pipeline 1st data item flowing through the pipeline

MOUSETRAP: A Basic FIFO (contd.) Latch controller (XNOR) acts as “protocol converter”: 2 distinct transitions (up or down)  pulsed latch enable Latch is re-enabled when next stage is “done” Latch is disabled when current stage is “done” Latch Controller 2 transitions per latch cycle ackN-1 ackN En reqN doneN reqN+1 Data in Data out Data Latch Stage N-1 Stage N Stage N+1

MOUSETRAP: FIFO Cycle Time reqN ackN-1 reqN+1 ackN Data Latch Latch Controller doneN Data in Data out Stage N Stage N-1 Stage N+1 En 3 Fast self-loop: N disables itself 2 1 2 N re-enabled to compute N computes N+1 computes Cycle Time =

Detailed Controller Operation Stage N’s Latch Controller ack from N+1 done from N to Latch One pulse per data item flowing through: down transition: caused by “done” of N up transition: caused by “done” of N+1

MOUSETRAP: Pipeline With Logic Simple Extension to FIFO: insert logic block + matching delay in each stage Latch Controller ackN-1 ackN reqN reqN+1 delay delay delay doneN logic logic logic Data Latch Stage N-1 Stage N Stage N+1 Logic Blocks: can use standard single-rail (non-hazard-free) “Bundled Data” Requirement: each “req” must arrive after data inputs valid and stable

Complex Pipelining: Forks & Joins Problems with Linear Pipelining: handles limited applications; real systems are more complex fork join Non-Linear Pipelining: has forks/joins Contribution: introduce efficient circuit structures Forks: distribute data + control to multiple destinations Joins: merge data + control from multiple sources Enabling technology for building complex async systems

Forks and Joins: Implementation req req2 Stage N C req1 ack req ack2 Stage N C ack1 Join: merge multiple requests Fork: merge multiple acknowledges

Performance, Timing and Optzn. MOUSETRAP with Logic: Stage Latency = Cycle Time =

Timing Analysis Main Timing Constraint: avoid “data overrun” (hold time) Data must be safely “captured” by Stage N before new inputs arrive from Stage N-1 simple 1-sided timing constraint: fast latch disable Stage N’s “self-loop” faster than entire path thru prior stage Stage N Data Latch Latch Controller doneN logic delay Stage N-1 reqN ackN-1 reqN+1 ackN

Experimental Results Simulations of FIFO’s: ~3 GHz (in 0.13u IBM process) Recent fabricated chip: GCD ~2 GHz simulated speed Chips tested to be fully functional Will show demo later

In-Class Exercise Modify MOUSETRAP to remove the “data overrun” timing constraint How is the performance affected?

Homework #3 (due Tue Sep 11, 2007) Read MOUSETRAP paper [TVLSI Jun ’07] Modify MOUSETRAP to reduce power consumption Make the latches normally opaque Latches become transparent only when new data arrives at their inputs Prevents glitchy/garbage data from propagation How is the performance (throughput, latency) affected?

MOUSETRAP Advanced Topics

Special Case: Using “Clocked Logic” Clocked-CMOS = C2MOS: eliminate explicit latches latch folded into logic itself pull-up network pull-down “keeper” En A General C2MOS gate logic inputs output “keeper” En A B logic output C2MOS AND-gate

Gate-Level MOUSETRAP: with C2MOS Use C2MOS: eliminate explicit latches New Control Optimization = “Dual-Rail XNOR” eliminate 2 inverters from critical path Latch Controller ackN-1 2 ackN 2 2 2 2 En,En 2 doneN 2 2 (En,En’) (done,done’) (ack,ack’) reqN reqN+1 pair of bit latches C2MOS logic Stage N-1 Stage N Stage N+1

Timing Optzn: Reducing Cycle Time Analytical Cycle Time = Goal: shorten (in steady-state operation) Steady-state = no undue pipeline congestion Observation: XNOR switches twice per data item: only 2nd (up) transition critical for performance: Solution: reduce XNOR output swing degrade “slew” for start of pulse allows quick pulse completion: faster rise time Still safe when congested: pulse starts on time pulse maintained until congestion clears

Timing Optzn (contd.) “optimized” XNOR output “unoptimized” N “done” N+1 “done” “optimized” XNOR output latch only partly disabled; recovers quicker! (no pulse width requirement) “unoptimized” XNOR output N’s latch disabled N’s latch re-enabled

Comparison with Wave Pipelining Two Scenarios: Steady State: both MOUSETRAP and wave pipelines act like transparent “flow through” combinational pipelines Congestion: right environment stalls: each MOUSETRAP stage safely captures data internal stage slow: MOUSETRAP stages to its left safely capture data  congestion properly handled in MOUSETRAP Conclusion: MOUSETRAP has potential of… speed of wave pipelining greater robustness and flexibility

Timing Issues: Handling Wide Datapaths Buffers inserted to amplify latch signals (En): reqN reqN+1 doneN Stage N Stage N-1 En reqN reqN+1 doneN Stage N Stage N-1 En Reducing Impact of Buffers: control uses unbuffered signals  buffer delay off of critical path! datapath skewed w.r.t. control Timing assumption: buffer delays roughly equal