Asynchronous Pipelines Author: Peter Yeh Advisor: Professor Beerel.

Slides:



Advertisements
Similar presentations
Christopher LaFrieda and Rajit Manohar Computer Systems Laboratory Cornell University Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits.
Advertisements

Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory.
Introduction to CMOS VLSI Design Sequential Circuits.
VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
MICROELETTRONICA Sequential circuits Lection 7.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits David Harris Harvey Mudd College Spring 2004.
Digital Logic Design Lecture # 17 University of Tehran.
(Neil west - p: ). Finite-state machine (FSM) which is composed of a set of logic input feeding a block of combinational logic resulting in a set.
Lecture 12 Latches Section , Block Diagram of Sequential Circuit gates New output is dependent on the inputs and the preceding values.
Sequential Circuits. Outline  Floorplanning  Sequencing  Sequencing Element Design  Max and Min-Delay  Clock Skew  Time Borrowing  Two-Phase Clocking.
Decoupled Pipelines: Rationale, Analysis, and Evaluation Frederick A. Koopmans, Sanjay J. Patel Department of Computer Engineering University of Illinois.
1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Clock Design Adopted from David Harris of Harvey Mudd College.
Embedding of Asynchronous Wave Pipelines into Synchronous Data Processing Stephan Hermanns, Sorin Alexander Huss University of Technology Darmstadt, Germany.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
1 Clockless Logic Montek Singh Tue, Mar 23, 2004.
1 Clockless Logic Montek Singh Tue, Mar 16, 2004.
COMP Clockless Logic and Silicon Compilers Lecture 3
Jordi Cortadella, Universitat Politècnica de Catalunya, Spain
Synthesis of synchronous elastic architectures Jordi Cortadella (Universitat Politècnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel.
1 Clockless Logic Montek Singh Tue, Mar 21, 2006.
High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths Montek Singh and Steven Nowick Columbia University New York, USA
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
1 Clockless Computing Montek Singh Thu, Sep 13, 2007.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
Digital System Bus A bus in a digital system is a collection of (usually unbroken) signal lines that carry module-to-module communications. The signals.
Micropipeline design in asynchronous circuit Wilson Kwan M.A.Sc. Candidate Ottawa-Carleton Institute for Electrical & Computer Engineering (OCIECE) Carleton.
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Amitava Mitra Intel Corp., Bangalore, India William F. McLaughlin
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
Optimal digital circuit design Mohammad Sharifkhani.
1 Presented By Şahin DELİPINAR Simon Moore,Peter Robinson,Steve Wilcox Computer Labaratory,University Of Cambridge December 15, 1995 Rotary Pipeline Processors.
1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
UNIVERSITY OF ROSTOCK Institute of Applied Microelectronics and Computer Science Single-Rail Self-timed Logic Circuits in Synchronous Designs Frank Grassert,
Behnam Ghavami and Hossein Pedram Presented by Wei-Lun Hung A CAD Framework for Leakage Power Aware Synthesis of Asynchronous Circuits.
Reading Assignment: Rabaey: Chapter 9
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
Project : GasP pipeline in asynchronous circuit Wilson Kwan M.A.Sc. Candidate Ottawa-Carleton Institute for Electrical & Computer Engineering (OCIECE)
Chapter5: Synchronous Sequential Logic – Part 1
Dynamic Logic.
EE141 Combinational Circuits 1 Chapter 6 (I) Designing Combinational Logic Circuits Dynamic CMOS LogicDynamic CMOS Logic V1.0 5/4/2003.
CS151 Introduction to Digital Design Chapter 5: Sequential Circuits 5-1 : Sequential Circuit Definition 5-2: Latches 1Created by: Ms.Amany AlSaleh.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Lecture 11: Sequential Circuit Design
Advanced Digital Design
CS 352 Introduction to Logic Design
Recap: Lecture 1 What is asynchronous design? Why do we want to study it? What is pipelining? How can it be used to design really fast hardware?
Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Clockless Logic: Asynchronous Pipelines
Wagging Logic: Moore's Law will eventually fix it
Clockless Computing Lecture 3
Presentation transcript:

Asynchronous Pipelines Author: Peter Yeh Advisor: Professor Beerel

USC Asynchronous Group2 Motivation Can we reduce asynchronous pipelines communication overhead while hiding precharge time?Can we reduce asynchronous pipelines communication overhead while hiding precharge time? Can we have cycle time in asynchronous pipelines as fast, if not faster, than best synchronous counterparts.Can we have cycle time in asynchronous pipelines as fast, if not faster, than best synchronous counterparts.

USC Asynchronous Group3 Motivation: System Performance Fixed stage pipelineFixed stage pipeline –Low pipeline usage: Low latency is critical –High pipeline usage: Cycle time is the limiting factor to generate new outputs as fast as possible Flexible stage pipelineFlexible stage pipeline –With zero forward overhead and short cycle time, we can achieve a given desired throughput with fewer stages

USC Asynchronous Group4 Motivation: System Performance Pipelines with loop dependenciesPipelines with loop dependencies –Optimal cycle time is the sum of latency around the loop –Pipelining is required to ensure precharge/reset is not in the critical path –Our scheme requires less pipeline stages to achieve same performance

USC Asynchronous Group5 Introduction Asynchronous pipeline schemes using Taken Detector (TD)Asynchronous pipeline schemes using Taken Detector (TD) Best use in coarse-grained pipelinesBest use in coarse-grained pipelines Two schemes targeting different requirements (a possible third SI scheme as well)Two schemes targeting different requirements (a possible third SI scheme as well)

USC Asynchronous Group6 Outline Background reviewBackground review –Sutherland –Ted William –Renaudin –Martin Taken pipelineTaken pipeline Performance comparisonPerformance comparison ConclusionConclusion

USC Asynchronous Group7 Definition Stage: A collection of logic that is precharged or evaluated at the same timeStage: A collection of logic that is precharged or evaluated at the same time Cycle: The time it takes for a stage to start next evaluation from the current oneCycle: The time it takes for a stage to start next evaluation from the current one Forward Latency: The time it takes between the start of the evaluation of current stage to next stageForward Latency: The time it takes between the start of the evaluation of current stage to next stage

USC Asynchronous Group8 Background Outline Sutherland’s Micropipeline schemeSutherland’s Micropipeline scheme Ted William’s PS0 and PC0 pipeline schemesTed William’s PS0 and PC0 pipeline schemes Renaudin’s DCVSL pipeline schemeRenaudin’s DCVSL pipeline scheme Martin’s deep pipeline schemeMartin’s deep pipeline scheme

USC Asynchronous Group9 Sutherland’s Micropipeline Father of Asynchronous Pipeline. Presented in Turing Award lectureFather of Asynchronous Pipeline. Presented in Turing Award lecture Delay InsensitiveDelay Insensitive C Cd Pd P REGREG C Cd Pd P REGREG LOGICLOGIC C Cd Pd P REGREG C Cd Pd P REGREG LOGICLOGIC C Cd Pd P REGREG C Cd Pd P REGREG LOGICLOGIC c c c R(in) A(in) D(in) A(out) R(out) D(out)

USC Asynchronous Group10 William’s PC0 Speed IndependentSpeed Independent Cycle Time (P) = 3tF  +1tF  +4tC+4tDCycle Time (P) = 3tF  +1tF  +4tC+4tD Forward Latency (L f ) = 1tF  +1tD+1tCForward Latency (L f ) = 1tF  +1tD+1tC Precharged Function Block F1 Precharged Function Block F3 Precharged Function Block F3 D1 C1C2C3 D2 D3 D(in) R(in) A(in) A(out) R(out) Precharged Function Block F1 Precharged Function Block F3 Precharged Function Block F1 Precharged Function Block F3 Precharged Function Block F2 D(out)

USC Asynchronous Group11 PC0 Timing Diagram The cycle time is shown in read arrows while the blue arrows show the precharge phaseThe cycle time is shown in read arrows while the blue arrows show the precharge phase

USC Asynchronous Group12 Dependency Graph C2  F2  C3  F3  C4  F4  D2  C1  F1  C2  F2  C3  F3  D1  D2  D3  CC FF DD CC FF DD Folded Dependency Graph Flat Dependency Graph

USC Asynchronous Group13 William’s PC1 Cycle Time (P) = 2tF  +4tC+4tDCycle Time (P) = 2tF  +4tC+4tD Forward Latency (L f ) = 1tF  +2tC+1tDForward Latency (L f ) = 1tF  +2tC+1tD Precharged Function Block F1 Precharged Function Block F2 DA C1C2 DB D2 D(in) R(in) A(in) A(out) R(out) D(out) C Latch

USC Asynchronous Group14 William’s PS0 Not Speed IndependentNot Speed Independent Cycle Time (P) = 3tF  +1tF  +2tDCycle Time (P) = 3tF  +1tF  +2tD Forward Latency (L f ) = 1tF Forward Latency (L f ) = 1tF  Precharged Function Block F1 Precharged Function Block F2 Precharged Function Block F3 D1 D2 D3 D(in) A(in) A(out) D(out)

USC Asynchronous Group15 PS0 Timing Diagram

USC Asynchronous Group16 PS0 Timing Assumption The pipeline has to meet the following timing assoumptionThe pipeline has to meet the following timing assoumption tF 

USC Asynchronous Group17 Renaudin’s DCVSL Pipeline Compare to Ted’s PC0 onlyCompare to Ted’s PC0 only Use DCVSL exclusivelyUse DCVSL exclusively Introduce Latched DCVSLIntroduce Latched DCVSL Improve cycle time but not forward latencyImprove cycle time but not forward latency Cycle Time (P) = 1tF  +1tF  + 4tC +2tDCycle Time (P) = 1tF  +1tF  + 4tC +2tD Forward Latency (L f ) = 1tF  + 1tC +1tDForward Latency (L f ) = 1tF  + 1tC +1tD

USC Asynchronous Group18 DCVS Logic Family DCVS Logic Latched DCVS Logic

USC Asynchronous Group19 More on DCVSL AdvantageAdvantage –Fast, based on the dynamic domino type logic –Build-in Four-Phase handshaking –Robust completion sensing –Storage element DisadvantageDisadvantage –Higher Complexity - increase in number of transistors and area –Higher Power dissipation

USC Asynchronous Group20 DCVS Pipeline Precharged Function Block F1 Precharged Function Block F2 Precharged Function Block F3 D1 C1C2C3 D2 D3 D(in) R(in) A(in) A(out) R(out) D(out) Cycle Time (P) = 1tF  +1tF  +4tC +2tDCycle Time (P) = 1tF  +1tF  +4tC +2tD (2tF  +4tC +2tD ) (2tF  +4tC +2tD ) Forward Latency (L f ) = 1tF  +1tC +1tDForward Latency (L f ) = 1tF  +1tC +1tD

USC Asynchronous Group21 DCVS Pipeline Timing Diagram

USC Asynchronous Group22 DCVS Dependency Graph CC FF DD CC FF DD Folded Dependency Graph Cycle Time (P) = 1tF  +1tF  +4tC +2tDCycle Time (P) = 1tF  +1tF  +4tC +2tD Forward Latency (L f ) = 1tF  +1tC +1tDForward Latency (L f ) = 1tF  +1tC +1tD

USC Asynchronous Group23 Martin’s Pipeline Schemes Deep pipeliningDeep pipelining Quasi Delay-Insensitive (QDI)  No timing assumptionQuasi Delay-Insensitive (QDI)  No timing assumption Based on different handshaking reshufflingBased on different handshaking reshuffling Best scheme has high concurrency which reduce control overheadBest scheme has high concurrency which reduce control overhead Control logic is more complexControl logic is more complex

USC Asynchronous Group24 Basic Asynchronous Handshaking 2 L0L0 L1L1 LeLe ReRe R0R0 R1R1 1 L0L0 L1L1 LeLe ReRe R0R0 R1R1 3 L0L0 L1L1 LeLe ReRe R0R0 R1R1 Reshuffling eliminates the explicit variable xReshuffling eliminates the explicit variable x Large control overheadLarge control overhead L1L1 LeLe LeLe L1L1 R1R1 R1R1 ReRe ReRe

USC Asynchronous Group25 Handshaking Reshuffling Still wait for predecessor to reset before resetting itself  larger overhead for more inputsStill wait for predecessor to reset before resetting itself  larger overhead for more inputs 2 L0L0 L1L1 LeLe ReRe R0R0 R1R1 1 L0L0 L1L1 LeLe ReRe R0R0 R1R1 3 L0L0 L1L1 LeLe ReRe R0R0 R1R1 L1L1 LeLe LeLe L1L1 R1R1 R1R1 ReRe ReRe

USC Asynchronous Group26 Precharge-Logic Half-Buffer Doesn’t wait for the predecessor to reset before it resets its outputs. Yet, the control logic wait for the reset of the predecessor only after current stage has resetDoesn’t wait for the predecessor to reset before it resets its outputs. Yet, the control logic wait for the reset of the predecessor only after current stage has reset 2 L0L0 L1L1 LeLe ReRe R0R0 R1R1 1 L0L0 L1L1 LeLe ReRe R0R0 R1R1 3 L0L0 L1L1 LeLe ReRe R0R0 R1R1 L1L1 LeLe LeLe L1L1 R1R1 R1R1 ReRe ReRe

USC Asynchronous Group27 Precharge-Logic Full-Buffer Allows the neutrality test of the output data to overlap with raising the left enablesAllows the neutrality test of the output data to overlap with raising the left enables Complex control logic, requires extra state variableComplex control logic, requires extra state variable 2 L0L0 L1L1 LeLe ReRe R0R0 R1R1 1 L0L0 L1L1 LeLe ReRe R0R0 R1R1 3 L0L0 L1L1 LeLe ReRe R0R0 R1R1 L1L1 LeLe LeLe L1L1 R1R1 R1R1 ReRe ReRe en  en 

USC Asynchronous Group28 Martin’s PCHB Full-adder

USC Asynchronous Group29 Martin’s Pipeline in General The Cycle time is limited by the properties of QDIThe Cycle time is limited by the properties of QDI –Next stage has to finish precharge before the current stage can evaluate next input Precharged Function Block F1 Precharged Function Block F2 Precharged Function Block F3 D1 D2 D3 D(in) D(out) Control LeLe LeLe ReRe

USC Asynchronous Group30 Performance Analysis on PCFB Control logic can be seen as completion detection (D) plus C-element (C)Control logic can be seen as completion detection (D) plus C-element (C) Reshuffling of handshaking just changes the degree of the concurrency but it doesn’t affect the best case performance analysisReshuffling of handshaking just changes the degree of the concurrency but it doesn’t affect the best case performance analysis Cycle Time (P) = 3tF  +1tF  +2tC +2tDCycle Time (P) = 3tF  +1tF  +2tC +2tD Forward Latency (L f ) = 1tF Forward Latency (L f ) = 1tF 

USC Asynchronous Group31 Outline Background reviewBackground review –Sutherland –Ted William –Renaudin –Martin Taken pipelineTaken pipeline Performance comparisonPerformance comparison ConclusionConclusion

USC Asynchronous Group32 Taken Pipeline Use of Taken DetectorUse of Taken Detector Two schemes to satisfy different requirementsTwo schemes to satisfy different requirements Both are not speed independentBoth are not speed independent

USC Asynchronous Group33 Initial Idea Precharge: only when next stage has taken the current resultPrecharge: only when next stage has taken the current result Evaluation: only when next stage has prechargedEvaluation: only when next stage has precharged Similar idea to Martin’s pipeline schemesSimilar idea to Martin’s pipeline schemes

USC Asynchronous Group34 Further Observation PrechargePrecharge –We can precharge the current stage as soon as the first level logic of next stage has evaluated  next stage has taken the result EvaluateEvaluate –Evaluation can be started as soon as the guarded N-transistor in the first level logic of next stage has turned off

USC Asynchronous Group35 Relax Precharge (RP) Constraint Current stage can precharge as soon as the first level logic of next stage has evaluated: Next stage has Taken the resultCurrent stage can precharge as soon as the first level logic of next stage has evaluated: Next stage has Taken the result Current stage can evaluate as soon as the first level logic of next stage has precharged, blocking the new result from passing throughCurrent stage can evaluate as soon as the first level logic of next stage has precharged, blocking the new result from passing through No need for extra control logic except TD which is similar to completion detectorNo need for extra control logic except TD which is similar to completion detector

USC Asynchronous Group36 RP Pipeline Scheme Precharged Function Block F1 Precharged Function Block F2 Precharged Function Block F3 TD1TD2TD3 D(in)D(out) Cycle Time (P) = 2tF  + 1tF1  +1tF1  +2tTDCycle Time (P) = 2tF  + 1tF1  +1tF1  +2tTD Forward Latency (L f ) = 1tF Forward Latency (L f ) = 1tF 

USC Asynchronous Group37 RP Timing Diagram

USC Asynchronous Group38 RP Timing Assumption Easy to meet timing assumptionEasy to meet timing assumption

USC Asynchronous Group39 RP Timing Assumption Cont. tF1 i is the first level logic of stage itF1 i is the first level logic of stage i tF2 i is the logic after the first level of stage itF2 i is the logic after the first level of stage i Assuming rising and falling of TD is the sameAssuming rising and falling of TD is the same

USC Asynchronous Group40 Relax Evaluation (RE) Constraint Current stage can start the evaluation about the same time as the next stage turns off the guarded N-transistors in the first level logicCurrent stage can start the evaluation about the same time as the next stage turns off the guarded N-transistors in the first level logic Requires general C-element, yet improve cycle timeRequires general C-element, yet improve cycle time

USC Asynchronous Group41 RE Pipeline Scheme TD can be skewed for fast evaluation detectionTD can be skewed for fast evaluation detection Cycle Time (P) = 2tF  + 1tF1  +1tTD +1tCCycle Time (P) = 2tF  + 1tF1  +1tTD +1tC Forward Latency (L f ) = 1tF Forward Latency (L f ) = 1tF  Precharged Function Block F1 Precharged Function Block F2 Precharged Function Block F3 TD1TD2TD3 D(in)D(out) GC1 + ++

USC Asynchronous Group42 RE Timing Diagram

USC Asynchronous Group43 RE Timing Assumption 1 Precharge constraintPrecharge constraint

USC Asynchronous Group44 RE Timing Assumption 2 Evaluation constraint (Min Delay)Evaluation constraint (Min Delay)

USC Asynchronous Group45 Issue in Fine-Grained Pipelines In a fine-grained pipeline, such as Martin’s single gate pipeline, RE scheme may require buffering due to process variationIn a fine-grained pipeline, such as Martin’s single gate pipeline, RE scheme may require buffering due to process variation –Buffering is necessary because of second timing assumption, next gate (stage) may not have turned off N-stack before the result from current stage reaches it

USC Asynchronous Group46 Taken Detector (TD) Similar to Completion DetectorSimilar to Completion Detector Detect both evaluation and prechargeDetect both evaluation and precharge Inputs are the output of first level logic of each stageInputs are the output of first level logic of each stage

USC Asynchronous Group47 Datapath Merging & Splitting Datapath merging and splitting can be done similar to William’s styleDatapath merging and splitting can be done similar to William’s style Precharged Function Block F2a Precharged Function Block F3 TD2a TD3 D(out) Precharged Function Block F2b Precharged Function Block F1 TD1 TD2b C D(in)

USC Asynchronous Group48 Outline Background reviewBackground review –Sutherland –Ted William –Renaudin –Martin Taken pipelineTaken pipeline Performance comparisonPerformance comparison ConclusionsConclusions

USC Asynchronous Group49 Comparison of RE and Synchronous Skew Tolerant Assuming 4 stages pipeline, stage 1-4, and 4 phases clockingAssuming 4 stages pipeline, stage 1-4, and 4 phases clocking Synchronous:Synchronous: –Stage 1 starts next evaluation after stage 4 starts evaluation Asynchronous:Asynchronous: –Stage 1 starts next evaluation after we detect the completion of the first level logic of stage 3

USC Asynchronous Group50 Comparison Assumptions It is a balanced pipeline—all stages have equal evaluation timeIt is a balanced pipeline—all stages have equal evaluation time Precharge time is same as evaluation timePrecharge time is same as evaluation time

USC Asynchronous Group51 Graphical Comparison

USC Asynchronous Group52 Optimum Number of Stages Optimum Number of Stages (ONS)Optimum Number of Stages (ONS) Cycle Time is not the only factor in system performance, Forward Latency is also a limiting factorCycle Time is not the only factor in system performance, Forward Latency is also a limiting factor Larger cycle time can be compensated by increasing the number of stagesLarger cycle time can be compensated by increasing the number of stages However, high L f means system throughput can not be increased by adding more stagesHowever, high L f means system throughput can not be increased by adding more stages

USC Asynchronous Group53 Conclusion With Taken logic and some easy to meet timing requirement, we can achieve the best cycle time and forward latencyWith Taken logic and some easy to meet timing requirement, we can achieve the best cycle time and forward latency The performance comparison with existing pipeline schemes are favorableThe performance comparison with existing pipeline schemes are favorable Implementation is still required to prove the theoryImplementation is still required to prove the theory