1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.

Slides:



Advertisements
Similar presentations
Self-Timed Logic Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical and.
Advertisements

CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Self-Timed Systems Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical.
Introduction to CMOS VLSI Design Sequential Circuits
Give qualifications of instructors: DAP
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 4, 2011 Synchronous Circuits.
CS 151 Digital Systems Design Lecture 19 Sequential Circuits: Latches.
Avshalom Elyada, Ran GinosarPipeline Synchronization 1 A Unique and Successfully Implemented Approach to the Synchronization Problem Based on the article.
Delay/Phase Regeneration Circuits Crescenzo D’Alessandro, Andrey Mokhov, Alex Bystrov, Alex Yakovlev Microelectronics Systems Design Group School of EECE.
1 Lecture 20 Sequential Circuits: Latches. 2 Overview °Circuits require memory to store intermediate data °Sequential circuits use a periodic signal to.
Decoupled Pipelines: Rationale, Analysis, and Evaluation Frederick A. Koopmans, Sanjay J. Patel Department of Computer Engineering University of Illinois.
1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Synchronous Digital Design Methodology and Guidelines
Clock Design Adopted from David Harris of Harvey Mudd College.
Embedding of Asynchronous Wave Pipelines into Synchronous Data Processing Stephan Hermanns, Sorin Alexander Huss University of Technology Darmstadt, Germany.
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 1 VLSI Architectures Lecture 3 S&F Ch. 5: Handshake Ckt Implementations.
1 A Modular Synchronizing FIFO for NoCs Vainbaum Yuri.
ENGIN112 L30: Random Access Memory November 14, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 30 Random Access Memory (RAM)
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
1 Clockless Logic Montek Singh Tue, Mar 23, 2004.
1 Clockless Logic Montek Singh Tue, Mar 16, 2004.
COMP Clockless Logic and Silicon Compilers Lecture 3
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
Jordi Cortadella, Universitat Politècnica de Catalunya, Spain
Modern VLSI Design 2e: Chapter 5 Copyright  1998 Prentice Hall PTR Topics n Memory elements. n Basics of sequential machines.
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
1 Clockless Logic Montek Singh Tue, Mar 21, 2006.
High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths Montek Singh and Steven Nowick Columbia University New York, USA
CS 151 Digital Systems Design Lecture 30 Random Access Memory (RAM)
A 1.5 GHz AWP Elliptic Curve Crypto Chip O. Hauck, S. A. Huss ICSLAB TU Darmstadt A. Katoch Philips Research A 1.5 GHz AWP Elliptic Curve Crypto Chip O.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
1 Clockless Computing Montek Singh Thu, Sep 13, 2007.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
Clockless Logic Montek Singh Tue, Apr 6, Case Study: An Adaptively-Pipelined Mixed Synchronous-Asynchronous System Montek Singh Univ. of North Carolina.
Digital Computer Design Fundamental
Micropipeline design in asynchronous circuit Wilson Kwan M.A.Sc. Candidate Ottawa-Carleton Institute for Electrical & Computer Engineering (OCIECE) Carleton.
Amitava Mitra Intel Corp., Bangalore, India William F. McLaughlin
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Asynchronous Pipelines Author: Peter Yeh Advisor: Professor Beerel.
Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.
NTU Confidential Test Asynchronous FIR Filter Design Presenter: Po-Chun Hsieh Advisor:Tzi-Dar Chiueh Date: 2003/12/1.
Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.
1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.
UNIVERSITY OF ROSTOCK Institute of Applied Microelectronics and Computer Science Single-Rail Self-timed Logic Circuits in Synchronous Designs Frank Grassert,
12004 MAPLD: 153Brej Early output logic and Anti-Tokens Charlie Brej APT Group Manchester University.
Lecture 11: FPGA-Based System Design October 18, 2004 ECE 697F Reconfigurable Computing Lecture 11 FPGA-Based System Design.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Lecture 11: Sequential Circuit Design
Recap: Lecture 1 What is asynchronous design? Why do we want to study it? What is pipelining? How can it be used to design really fast hardware?
Blame Passing for Analysis and Optimisation
Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.
CSE 370 – Winter Sequential Logic - 1
Pipeline Principle A non-pipelined system of combination circuits (A, B, C) that computation requires total of 300 picoseconds. Comb. logic.
High Performance Asynchronous Circuit Design and Application
Clockless Logic: Asynchronous Pipelines
Clockless Computing Lecture 3
Presentation transcript:

1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines

2 Different Points in the Design Space Williams/Horowitz’s PS0: Dual-rail Dual-rail Data-dependent completion Data-dependent completion Dynamic logic Dynamic logic No extra latches No extra latches “Zero-overhead” latency “Zero-overhead” latency 4-phase handshakes: resetting overhead 4-phase handshakes: resetting overhead Sutherland’s micropipelines: Single-rail Single-rail Worst case matched delay Worst case matched delay Statuc logic Statuc logic Explicit latches Explicit latches Latch latencies = overhead Latch latencies = overhead Elegant transition signaling Elegant transition signaling

3 Precharge  Evaluate: another 3 events Complete cycle: 6 events indicates “done” PRECHARGE N: when N+1 completes evaluation PRECHARGE N: when N+1 completes evaluation  delete data: after next stage has copied it EVALUATE N: when N+1 completes precharging EVALUATE N: when N+1 completes precharging  accept new data: after next stage is emptied PS0 Protocol evaluates evaluates evaluates indicates “done” precharges 3 Evaluate  Precharge: 3 events N N+1 N+2

4 PS0 Performance Cycle Time =

5 Drawbacks of PSO Pipelining 1. Poor throughput: long cycle time: 6 events per cycle long cycle time: 6 events per cycle data “tokens” are forced far apart in time data “tokens” are forced far apart in time 2. Limited storage capacity: max only 50% of stages can hold distinct tokens max only 50% of stages can hold distinct tokens data tokens must be separated by at least one spacer data tokens must be separated by at least one spacer Our Research Goals: address both issues still maintain very low latency still maintain very low latency

6 Lecture 7: Recent Approaches

7 Recent Approaches 3 novel styles for high-speed async pipelining: “Lookahead Pipelines” (LP) [Singh/Nowick, Async-00] “Lookahead Pipelines” (LP) [Singh/Nowick, Async-00] “High-Capacity Pipelines” (HC) [Singh/Nowick, WVLSI-00] “High-Capacity Pipelines” (HC) [Singh/Nowick, WVLSI-00] MOUSETRAP Pipelines [Singh/Nowick, TAU-00] MOUSETRAP Pipelines [Singh/Nowick, TAU-00] Goal: significantly improve throughput of PS0 Two Distinct Strategies: LP: introduce protocol optimizations LP: introduce protocol optimizations  “shave off” components from critical cycle HC: fundamentally new protocol HC: fundamentally new protocol  greater concurrency: “loosely-coupled” stages  

8Outline è New Asynchronous Pipelines: è Lookahead Pipelines (LP) High-Capacity Pipelines (HC) High-Capacity Pipelines (HC) MOUSETRAP Pipelines MOUSETRAP Pipelines Dynamic circuit style Static circuit style

9 Lookahead Pipelines: Strategy #1 Use non-neighbor communication: stage receives information from multiple later stages stage receives information from multiple later stages allows “early evaluation” allows “early evaluation” Benefit: stage gets head-start on next cycle

10 Lookahead Pipelines: Strategy #2 Use early completion detection: completion detector moved before stage (not after) completion detector moved before stage (not after) stage indicates “early done” in parallel with computation stage indicates “early done” in parallel with computation Benefit: again, stage gets head-start on next cycle early completion detector

11 Lookahead Pipelines: Overview 5 New Designs: è“Dual-Rail” Data Signaling: LP3/1: “early evaluation” LP3/1: “early evaluation” LP2/2: “early done” LP2/2: “early done” LP2/1: “early evaluation” + “early done” LP2/1: “early evaluation” + “early done”  “Single-Rail” Bundled-Data Signaling: LP SR 2/2: “early done” LP SR 2/2: “early done” LP SR 2/1: “early evaluation” + “early done” LP SR 2/1: “early evaluation” + “early done”

12 Optimization = “early evaluation” each stage has two control inputs: from stages N+1 and N+2 each stage has two control inputs: from stages N+1 and N+2 Idea: shorten precharge phase terminate precharge early: when N+2 is done evaluating terminate precharge early: when N+2 is done evaluating Dual-Rail Design #1: LP3/1 Data in Data out PC Eval From N+2 N N+1 N+2 Processing Block Completion Detector

13 LP3/1 Protocol LP3/1 Protocol PRECHARGE N: when N+1 completes evaluation PRECHARGE N: when N+1 completes evaluation EVALUATE N: when N+2 completes evaluation EVALUATE N: when N+2 completes evaluation New! Enables “early evaluation!” 4 N evaluates N+1 evaluates N+2 indicates “done” N+2 evaluates N N+1 N+2 N+1 indicates “done” 3

14 PS0PS0 LP3/1LP3/1 LP3/1: Comparison with PS NN+1N+2 NN+1N+2 Enables “early evaluation!” 1 1 evaluates evaluates 2 2 evaluates evaluates 3 3 evaluates evaluates Only 4 events in cycle! 6 events in cycle PRECHARGE N: when N+1 completes evaluation 3 indicates “done” 3 EVALUATE N: when N+2 completes evaluation EVALUATE N: when N+1 completes precharging

LP3/1 Performance Cycle Time = saved path Savings over PS0: 1 Precharge + 1 Completion Detection

16 LP3/1: Inside a Stage Precharge when PC=1 (and Eval=0) Precharge when PC=1 (and Eval=0) Evaluate “early” when Eval=1 (or PC=0) Evaluate “early” when Eval=1 (or PC=0) PC (From Stage N+1) Eval (From Stage N+2) NAND A NAND gate merges 2 control inputs: Problem: “early” Eval=1 is non-persistent!  may be de-asserted before stage completes evaluation! Problem: “early” Eval=1 is non-persistent!  may be de-asserted before stage completes evaluation! Merging 2 Control Inputs: “early Eval” “old Eval”

17 LP3/1 Timing Constraints: Example Observation: PC=0 soon after Eval=1, and is persistent Solution: no change!  use PC as safe “takeover” for Eval! Timing Constraint: PC=0 must arrive before Eval de-asserted  simple one-sided timing requirement  other constraints as well… all easily satisfied in practice PC (From Stage N+1) Eval (From Stage N+2) NAND Problem (cont.): “early” Eval=1 non-persistent

18 Dual-Rail Design #2: LP2/2 Optimization = “early done” Idea: move completion detector before processing block Idea: move completion detector before processing block  stage indicates when “about to” precharge/evaluate Processing Block “early” Completion Detector Data in Data out “early done”

LP2/2 Protocol Completion Detection: performed in parallel with evaluation/precharge of stage N evaluates N+1 evaluates N N+1 N+2 2 “early done” of N+1 eval 3 3 “early done” of N+2 eval “early done” of N+1 prech

20 LP2/2 Performance LP2/2 savings over PS0: 1 Evaluation + 1 Precharge Cycle Time =

21 Dual-Rail Design #3: LP2/1 Hybrid of LP3/1 and LP2/2… Combines: early evaluation of LP3/1 early evaluation of LP3/1 early done of LP2/2 early done of LP2/2 Cycle time: Best of our dual-rail lookahead pipelines…