Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.

Similar presentations


Presentation on theme: "1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines."— Presentation transcript:

1 1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines

2 2 Recap: Lookahead Pipeline Styles 2 Strategies: 1. Early Evaluation 2. Early Done

3 3 Lookahead Pipelines: Strategy #1 Use non-neighbor communication: stage receives information from multiple later stages stage receives information from multiple later stages allows “early evaluation” allows “early evaluation” Benefit: stage gets head-start on next cycle

4 4 Lookahead Pipelines: Strategy #2 Use early completion detection: completion detector moved before stage (not after) completion detector moved before stage (not after) stage indicates “early done” in parallel with computation stage indicates “early done” in parallel with computation Benefit: again, stage gets head-start on next cycle early completion detector

5 5 Single-Rail Styles Adapt dual-rail styles to single-rail: replace dual-rail function blocks by single-rail blocks replace dual-rail function blocks by single-rail blocks replace completion detectors by matched delays replace completion detectors by matched delays request/done indicate valid data bit 1 request bit n bit 1 bit m done matcheddelay delaydelay delay Example: LP sr 2/2

6 6 Single-Rail Styles (contd.) Example: LP sr 2/1 delaydelay delay

7 7 High-Capacity Pipelines Singh/Nowick WVLSI-00, ISSCC-02, Async-02

8 8 Recent Approaches 3 novel styles for high-speed async pipelining: “Lookahead Pipelines” (LP) [Singh/Nowick, Async-00] “Lookahead Pipelines” (LP) [Singh/Nowick, Async-00] “High-Capacity Pipelines” (HC) [Singh/Nowick, WVLSI-00] “High-Capacity Pipelines” (HC) [Singh/Nowick, WVLSI-00] MOUSETRAP Pipelines [Singh/Nowick, TAU-00] MOUSETRAP Pipelines [Singh/Nowick, TAU-00] Goal: significantly improve throughput of PS0 Two Distinct Strategies: LP: introduce protocol optimizations LP: introduce protocol optimizations  “shave off” components from critical cycle HC: fundamentally new protocol HC: fundamentally new protocol  greater concurrency: “loosely-coupled” stages  

9 9 High-Capacity Pipeline: HC Key Idea: Decouple control for pull-up and pull-down increases pipeline concurrency  initiates next cycle early increases pipeline concurrency  initiates next cycle early once N+1 evaluates, can enter “isolate (hold) phase” once N+1 evaluates, can enter “isolate (hold) phase”  stage N allowed to complete entire next cycle! delaydelay delay stagecontroller pceval ack N N+1N+2

10 10 Inside an HC stage Decoupled control: pull-up and pull-down stacks are independently controllable: Pull-down stack “keeper” evaluationcontrol prechargecontrol eval data inputs data outputs pc pc asserted: precharge pc asserted: precharge eval asserted: evaluate eval asserted: evaluate both de-asserted: enter “isolate” (hold) phase both de-asserted: enter “isolate” (hold) phase

11 11 Cycle of an LP HC Stage Eval Isolate Precharge pc=1 eval=1 pc=1 eval=0 pc=0 eval=0 Eval Isolate Precharge  Only a single backward synchronization arc: once stage N+1 has completed Eval, N can perform entire next cycle! once stage N+1 has completed Eval, N can perform entire next cycle!  why safe?: N+1 enters isolate phase … key to greater concurrency  almost all existing approaches: require 2 arcs  One (natural) forward synchronization arc: stage N+1 evaluates new data only after N has evaluated stage N+1 evaluates new data only after N has evaluated Stage N Stage N+1

12 12 Formal Specification of Controller Problem: Specification too concurrent for direct synthesis desired precharge condition: N and N+1 have evaluated same data desired precharge condition: N and N+1 have evaluated same data problem: this condition not uniquely captured by given signals! problem: this condition not uniquely captured by given signals!  N may evaluate next data item, while N+1 stuck on current item! T+ T- (Evaluate of N+1 complete) (Precharge of N+1 complete) pc+ eval+ S+ eval- pc- S- (Startevaluate) (Evaluatecomplete) (Isolate) (Startprecharge) (Prechargecomplete)

13 13 Modified Specification of Controller Solution: Add a state variable ok2pc ok2pc records whether N+1 has “absorbed” N’s data item  ok2pc resets immediately when N deletes item (N precharges)  ok2pc is set when N+1 deletes item (N+1 precharges) ok2pc+ ok2pc- pc+eval+ S+ eval- pc- S- T+ T- (Evaluate of N+1 complete) (Precharge of N+1 complete)

14 14 Controller implementation Controller implementation is very simple: each signal implemented using a single gate each signal implemented using a single gate ok2pc typically off the critical path ok2pc typically off the critical path INV NAND3 aC + STS T ok2pc pc eval S

15 15 N enables itself for next evaluation N precharges Performance1 Cycle Time = N evaluates N N+1N+2 N+1 evaluates 3 2 N isolates 2

16 16 Ripple-Carry Adder: One Stage Mixed Dual-Rail/Single-Rail Datapath: single-rail: sum single-rail: sum dual-rail: A, B, Carry-in and Carry-out dual-rail: A, B, Carry-in and Carry-out  must implement binate functions using unate dynamic logic Full-AdderStage c in 1 c in 0 req c a0a0a0a0 a1a1a1a1 b0b0b0b0 b1b1b1b1 req ab c out 1 c out 0 sum doneABCarry-in Carry-out

17 17 Final Adder Architecture adderstage A,B sum carry in carry out shift-registers provide operand bits shift-registers accumulate sum bits leastsignificant mostsignificant

18 18Results Designed/simulated adder in each pipeline style Experimental Setup: design: 32-bit ripple-carry-adder design: 32-bit ripple-carry-adder technology: 0.6  HP CMOS, @3.3 V and 300°K technology: 0.6  HP CMOS, @3.3 V and 300°K New LP HC style: 10% faster than LP SR 2/1

19 19Conclusions Introduced 2 new asynchronous adders: Use novel pipeline protocols: Use novel pipeline protocols:  observe events from multiple later stages  decouple control of pull-up/pull-down Especially suitable for fine-grain (gate-level) pipelining Especially suitable for fine-grain (gate-level) pipelining Very high-throughputs obtained: Very high-throughputs obtained:  0.93-1.02 GHz in 0.6   expected to outperform the best (IPCMOS: 3.3-4.5 GHz / 0.18  ) LP HC doubles the typical storage capacity LP HC doubles the typical storage capacity Robustly handle arbitrary-speed environments Robustly handle arbitrary-speed environments  useful as IP’s Future Work: Layout/fabrication, application to DSP’s


Download ppt "1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines."

Similar presentations


Ads by Google