Download presentation
Presentation is loading. Please wait.
1
High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths Montek Singh and Steven Nowick Columbia University New York, USA {montek,nowick}@cs.columbia.edu http://www.cs.columbia.edu/~montek Intl. Symp. Adv. Res. Asynchronous Circ. Syst. (ASYNC), April 2-6, 2000, Eilat, Israel.
2
2 Outline Introduction Background: Williams’ PS0 pipelines New Pipeline Designs Dual-Rail: LP3/1, LP2/2 and LP2/1 Dual-Rail: LP3/1, LP2/2 and LP2/1 Single-Rail: LP SR 2/1 Single-Rail: LP SR 2/1 Practical Issue: Handling slow environments Results and Conclusions
3
3 Why Dynamic Logic? Potentially: Higher speed Smaller area “Latch-free” pipelines: Logic gate itself provides an implicit latch lower latency shorter cycle time smaller area –– very important in gate-level pipelining! è Our Focus: Dynamic logic pipelines
4
4 How Do We Achieve High Throughput? Introduce novel pipeline protocols: l specifically target dynamic logic l reduce impact of handshaking delays shorter cycle times Pipeline at very fine granularity: l “gate-level:” each stage is a single-gate deep highest throughputs possible l latch-free datapaths especially desirable dynamic logic is a natural match
5
5 Prior Work: Asynchronous Pipelines Sutherland (1989), Yun/Beerel/Arceo (1996) very elegant 2-phase control expensive transition latches Day/Woods (1995), Furber/Liu (1996) 4-phase control simpler latches, but complex controllers Kol/Ginosar (1997) double latches greater concurrency, but area-expensive Molnar et al. (1997-99) Two designs: asp* and micropipeline both very fast, but: –asp*: complex timing, cannot handle latch-free dynamic datapaths –micropipeline: area-expensive, cannot do logic processing at all! Williams (1991), Martin (1997) dynamic stages no explicit latches! low latency throughput still limited
6
6 Background Introduction è Background: Williams’ PS0 pipelines New Pipeline Designs Dual-Rail: LP3/1, LP2/2 and LP2/1 Dual-Rail: LP3/1, LP2/2 and LP2/1 Single-Rail: LP SR 2/1 Single-Rail: LP SR 2/1 Practical Issue: Handling slow environments Results and Conclusions
7
7 PS0 Pipelines (Williams 1986-91) Basic Architecture: Function Block Completion Detector Data in Data out PC
8
8 PS0 Function Block Each output is produced using a dynamic gate: Pull-downstack “keeper” evaluationcontrol prechargecontrol PC data inputs data outputs to completion detector
9
9 Dual-Rail Completion Detector OR together two rails of each bit Combine results using C-element C Done OR bit 0 OR bit 1 OR bit n
10
10 Precharge Evaluate: another 3 events Complete cycle: 6 events N+1 indicates “done” l PRECHARGE N: when N+1 completes evaluation l EVALUATE N: when N+1 completes precharging PS0 Protocol 1 2 3 4 5 6 N evaluates N+1 evaluates N+2 evaluates N+2 indicates “done” N+1 precharges N+1 indicates “done” 3 Evaluate Precharge: 3 events N N+1 N+2
11
11 PS0 Performance 1 2 3 4 5 6 Cycle Time =
12
12 New Pipeline Designs Introduction Background: Williams’ PS0 pipelines è New Pipeline Designs Dual-Rail: LP3/1, LP2/2 and LP2/1 Single-Rail: LP SR 2/1 Single-Rail: LP SR 2/1 Practical Issue: Handling slow environments Results and Conclusions
13
13 Overview of Approach Our Goal: Shorter cycle time, without degrading latency Our Approach: Use “Lookahead Protocols” (LP): main idea: anticipate critical events based on richer observation Two new protocol optimizations: l “Early evaluation:” give stage head-start on evaluation by observing events further down the pipeline (actually, a similar idea proposed by Williams in PA0, but our designs exploit it much better) l “Early done:” stage signals “done” when it is about to precharge/evaluate
14
14 Uses “early evaluation:” l each stage now has two control inputs the new input comes from two stages ahead l evaluate N as soon as N+1 starts precharging Dual-Rail Design #1: LP3/1 Data in Data out PC Eval From N+2 N N+1 N+2
15
15 LP3/1 Protocol LP3/1 Protocol l PRECHARGE N: when N+1 completes evaluation l EVALUATE N: when N+2 completes evaluation New! 1 2 3 Enables “early evaluation!” 4 N evaluates N+1 evaluates N+2 indicates “done” N+2 evaluates N N+1 N+2 N+1 indicates “done” 3
16
16 PS0PS0 LP3/1LP3/1 LP3/1: Comparison with PS0 1 1 3 3 2 2 5 Only 4 events in cycle! 6 events in cycle 4 4 6 NN+1N+2 NN+1N+2
17
17 1 2 3 4 LP3/1 Performance Cycle Time = saved path Savings over PS0: 1 Precharge + 1 Completion Detection
18
18 Inside a Stage: Merging Two Controls l Precharge when PC=1 (and Eval=0) Evaluate “early” when Eval=1 (or PC=0) Evaluate “early” when Eval=1 (or PC=0) Pull-downstack “keeper” PC (From Stage N+1) Eval (From Stage N+2) NAND A NAND gate combines the two control inputs: Problem: “early” Eval=1 is non-persistent! it may get de-asserted before the stage has completed evaluation! Problem: “early” Eval=1 is non-persistent! it may get de-asserted before the stage has completed evaluation!
19
19 LP3/1 Timing Constraints: Example Observation: PC=0 soon after Eval=1, and is persistent use PC as safe “takeover” for Eval! Solution: no change! Timing Constraint: PC=0 arrives before Eval=1 is de-asserted simple one-sided timing requirement other constraints as well… all easily satisfied in practice PC (From Stage N+1) Eval (From Stage N+2) NAND Problem: “early” Eval=1 is non-persistent!
20
20 Dual-Rail Design #2: LP2/2 Uses “early done:” completion detector now before functional block completion detector now before functional block stage indicates “done” when about to precharge/evaluate Function Block “early” Completion Detector Data in Data out
21
21 LP2/2 Completion Detector Modified completion detectors needed: Done =1 when stage starts evaluating, and inputs valid Done =1 when stage starts evaluating, and inputs valid Done =0 when stage starts precharging Done =0 when stage starts precharging asymmetric C-element C Done OR bit 0 OR bit 1 OR bit n + + +PC
22
22 N+1 “early done” 1 2 4 LP2/2 Protocol Completion detection occurs in parallel with evaluation/precharge: N evaluates N+1 evaluates N N+1 N+2 2 N+1 “early done” 3 3 N+2 “early done”
23
23 LP2/2 Performance 1 2 3 4 Cycle Time = LP2/2 savings over PS0: 1 Evaluation + 1 Precharge
24
24 Dual-Rail Design #3: LP2/1 Hybrid of LP3/1 and LP2/2. Combines: early evaluation of LP3/1 early evaluation of LP3/1 early done of LP2/2 early done of LP2/2 Cycle Time =
25
25 New Pipeline Designs Introduction Background: Williams’ PS0 pipelines è New Pipeline Designs Dual-Rail: LP3/1, LP2/2 and LP2/1 Dual-Rail: LP3/1, LP2/2 and LP2/1 Single-Rail: LP SR 2/1 Practical Issue: Handling slow environments Results and Conclusions
26
26 Single-Rail Design: LP SR 2/1 Derivative of LP2/1, adapted to single-rail: bundled-data: matched delays instead of completion detectors delaydelay delay “Ack” to previous stages is “tapped off early” once in evaluate (precharge), dynamic logic insensitive to input changes
27
27 PC and Eval are combined exactly as in LP3/1 Inside an LP SR 2/1 Stage “done” generated by an asymmetric C-element done =1 when stage evaluates, and data inputs valid done =1 when stage evaluates, and data inputs valid done =0 when stage precharges done =0 when stage precharges PC (From Stage N+1) Eval (From Stage N+2) NAND aC + “ack” “req” in data in data out “req” out matched delay done
28
28 LP SR 2/1 Protocol 1 2 3 Cycle Time = N evaluates N+2 evaluates N+2 indicates “done” N N+1 N+2 2 N+1 evaluates N+1 indicates “done”
29
29 Practical Issue: Handling Slow Environments We inherit a timing assumption from Williams’ PS0: Input (left) environment must precharge reasonably fast Input (left) environment must precharge reasonably fastProblem: If environment is stuck in precharge, all pipelines (incl. PS0) will malfunction! Our Solution: Add a special robust controller for 1 st stage Add a special robust controller for 1 st stage simply synchronizes input environment and pipeline delay critical events until environment has finished precharge l Modular solution overcomes shortcoming of Williams’ PS0 l No serious throughput overhead real bottleneck is the slow environment!
30
30 Results and Conclusions Introduction Background: Williams’ PS0 pipelines New Pipeline Designs Dual-Rail: LP3/1, LP2/2 and LP2/1 Dual-Rail: LP3/1, LP2/2 and LP2/1 Single-Rail: LP SR 2/1 Single-Rail: LP SR 2/1 Practical Issue: Handling slow environments è Results and Conclusions
31
31 Results Designed/simulated FIFO’s for each pipeline style Experimental Setup: l design: 4-bit wide, 10-stage FIFO l technology: 0.6 HP CMOS l operating conditions: 3.3 V and 300°K
32
32 dual-rail single-rail Comparison with Williams’ PS0 LP2/1: >2X faster than Williams’ PS0 LP SR 2/1: 1.2 Giga items/sec
33
33 Comparison: LP SR 2/1 vs. Molnar FIFO’s LP SR 2/1 FIFO: 1.2 Giga items/sec Adding logic processing to FIFO: simply fold logic into dynamic gate little overhead Comparison with Molnar FIFO’s: l asp* FIFO: 1.1 Giga items/sec more complex timing assumptions not easily formalized requires explicit latches, separate from logic! adding logic processing between stages significant overhead l micropipeline: 1.7 Giga items/sec two parallel FIFO’s, each only 0.85 Giga/sec very expensive transition latches cannot add logic processing to FIFO!
34
34 datapath width = 32 dual-rail bits! Practicality of Gate-Level Pipelining When datapath is wide: Can often split into narrow “streams” comp. d et. f airly low cost! Use “localized” completion detector for each stream: for each stream: l need to examine only a few bits small fan-in small fan-in l send “done” to only a few gates small fan-out small fan-outdone fan-out=2 comp. det. fan-in = 2
35
35 Conclusions Introduced several new dynamic pipelines: l Use two novel protocols: –“early evaluation” –“early done” Especially suitable for fine-grain (gate-level) pipelining Especially suitable for fine-grain (gate-level) pipelining l Very high throughputs obtained: –dual-rail: >2X improvement over Williams’ PS0 –single-rail: 1.2 Giga items/second in 0.6 CMOS l Use easy-to-satisfy, one-sided timing constraints l Robustly handle arbitrary-speed environments –overcome a major shortcoming of Williams’ PS0 pipelines Recent Improvement: Even faster single-rail pipeline (WVLSI’00)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.