Download presentation
Presentation is loading. Please wait.
1
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines
2
2 Different Points in the Design Space Williams/Horowitz’s PS0: Dual-rail Dual-rail Data-dependent completion Data-dependent completion Dynamic logic Dynamic logic No extra latches No extra latches “Zero-overhead” latency “Zero-overhead” latency 4-phase handshakes: resetting overhead 4-phase handshakes: resetting overhead Sutherland’s micropipelines: Single-rail Single-rail Worst case matched delay Worst case matched delay Statuc logic Statuc logic Explicit latches Explicit latches Latch latencies = overhead Latch latencies = overhead Elegant transition signaling Elegant transition signaling
3
3 Precharge Evaluate: another 3 events Complete cycle: 6 events indicates “done” PRECHARGE N: when N+1 completes evaluation PRECHARGE N: when N+1 completes evaluation delete data: after next stage has copied it EVALUATE N: when N+1 completes precharging EVALUATE N: when N+1 completes precharging accept new data: after next stage is emptied PS0 Protocol 1 2 3 4 5 6 evaluates evaluates evaluates indicates “done” precharges 3 Evaluate Precharge: 3 events N N+1 N+2
4
4 PS0 Performance 1 2 3 4 5 6 Cycle Time =
5
5 Drawbacks of PSO Pipelining 1. Poor throughput: long cycle time: 6 events per cycle long cycle time: 6 events per cycle data “tokens” are forced far apart in time data “tokens” are forced far apart in time 2. Limited storage capacity: max only 50% of stages can hold distinct tokens max only 50% of stages can hold distinct tokens data tokens must be separated by at least one spacer data tokens must be separated by at least one spacer Our Research Goals: address both issues still maintain very low latency still maintain very low latency
6
6 Lecture 7: Recent Approaches
7
7 Recent Approaches 3 novel styles for high-speed async pipelining: “Lookahead Pipelines” (LP) [Singh/Nowick, Async-00] “Lookahead Pipelines” (LP) [Singh/Nowick, Async-00] “High-Capacity Pipelines” (HC) [Singh/Nowick, WVLSI-00] “High-Capacity Pipelines” (HC) [Singh/Nowick, WVLSI-00] MOUSETRAP Pipelines [Singh/Nowick, TAU-00] MOUSETRAP Pipelines [Singh/Nowick, TAU-00] Goal: significantly improve throughput of PS0 Two Distinct Strategies: LP: introduce protocol optimizations LP: introduce protocol optimizations “shave off” components from critical cycle HC: fundamentally new protocol HC: fundamentally new protocol greater concurrency: “loosely-coupled” stages
8
8Outline è New Asynchronous Pipelines: è Lookahead Pipelines (LP) High-Capacity Pipelines (HC) High-Capacity Pipelines (HC) MOUSETRAP Pipelines MOUSETRAP Pipelines Dynamic circuit style Static circuit style
9
9 Lookahead Pipelines: Strategy #1 Use non-neighbor communication: stage receives information from multiple later stages stage receives information from multiple later stages allows “early evaluation” allows “early evaluation” Benefit: stage gets head-start on next cycle
10
10 Lookahead Pipelines: Strategy #2 Use early completion detection: completion detector moved before stage (not after) completion detector moved before stage (not after) stage indicates “early done” in parallel with computation stage indicates “early done” in parallel with computation Benefit: again, stage gets head-start on next cycle early completion detector
11
11 Lookahead Pipelines: Overview 5 New Designs: è“Dual-Rail” Data Signaling: LP3/1: “early evaluation” LP3/1: “early evaluation” LP2/2: “early done” LP2/2: “early done” LP2/1: “early evaluation” + “early done” LP2/1: “early evaluation” + “early done” “Single-Rail” Bundled-Data Signaling: LP SR 2/2: “early done” LP SR 2/2: “early done” LP SR 2/1: “early evaluation” + “early done” LP SR 2/1: “early evaluation” + “early done”
12
12 Optimization = “early evaluation” each stage has two control inputs: from stages N+1 and N+2 each stage has two control inputs: from stages N+1 and N+2 Idea: shorten precharge phase terminate precharge early: when N+2 is done evaluating terminate precharge early: when N+2 is done evaluating Dual-Rail Design #1: LP3/1 Data in Data out PC Eval From N+2 N N+1 N+2 Processing Block Completion Detector
13
13 LP3/1 Protocol LP3/1 Protocol PRECHARGE N: when N+1 completes evaluation PRECHARGE N: when N+1 completes evaluation EVALUATE N: when N+2 completes evaluation EVALUATE N: when N+2 completes evaluation New! 1 2 3 Enables “early evaluation!” 4 N evaluates N+1 evaluates N+2 indicates “done” N+2 evaluates N N+1 N+2 N+1 indicates “done” 3
14
14 PS0PS0 LP3/1LP3/1 LP3/1: Comparison with PS0 5 4 4 6 NN+1N+2 NN+1N+2 Enables “early evaluation!” 1 1 evaluates evaluates 2 2 evaluates evaluates 3 3 evaluates evaluates Only 4 events in cycle! 6 events in cycle PRECHARGE N: when N+1 completes evaluation 3 indicates “done” 3 EVALUATE N: when N+2 completes evaluation EVALUATE N: when N+1 completes precharging
15
15 1 2 3 4 LP3/1 Performance Cycle Time = saved path Savings over PS0: 1 Precharge + 1 Completion Detection
16
16 LP3/1: Inside a Stage Precharge when PC=1 (and Eval=0) Precharge when PC=1 (and Eval=0) Evaluate “early” when Eval=1 (or PC=0) Evaluate “early” when Eval=1 (or PC=0) PC (From Stage N+1) Eval (From Stage N+2) NAND A NAND gate merges 2 control inputs: Problem: “early” Eval=1 is non-persistent! may be de-asserted before stage completes evaluation! Problem: “early” Eval=1 is non-persistent! may be de-asserted before stage completes evaluation! Merging 2 Control Inputs: “early Eval” “old Eval”
17
17 LP3/1 Timing Constraints: Example Observation: PC=0 soon after Eval=1, and is persistent Solution: no change! use PC as safe “takeover” for Eval! Timing Constraint: PC=0 must arrive before Eval de-asserted simple one-sided timing requirement other constraints as well… all easily satisfied in practice PC (From Stage N+1) Eval (From Stage N+2) NAND Problem (cont.): “early” Eval=1 non-persistent
18
18 Dual-Rail Design #2: LP2/2 Optimization = “early done” Idea: move completion detector before processing block Idea: move completion detector before processing block stage indicates when “about to” precharge/evaluate Processing Block “early” Completion Detector Data in Data out “early done”
19
19 1 2 4 LP2/2 Protocol Completion Detection: performed in parallel with evaluation/precharge of stage N evaluates N+1 evaluates N N+1 N+2 2 “early done” of N+1 eval 3 3 “early done” of N+2 eval “early done” of N+1 prech
20
20 LP2/2 Performance 1 2 3 4 LP2/2 savings over PS0: 1 Evaluation + 1 Precharge Cycle Time =
21
21 Dual-Rail Design #3: LP2/1 Hybrid of LP3/1 and LP2/2… Combines: early evaluation of LP3/1 early evaluation of LP3/1 early done of LP2/2 early done of LP2/2 Cycle time: Best of our dual-rail lookahead pipelines…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.