Download presentation
Presentation is loading. Please wait.
1
1 Clockless Logic Montek Singh Tue, Mar 23, 2004
2
2Outline Classic static logic pipeline: Sutherland Classic dynamic logic pipeline: Williams/Horowitz
3
3 A Classic Asynchronous Dynamic Pipeline Williams and Horowitz’s PS0 pipeline: Structure Operation Performance
4
4 A Classic Approach: PS0 Pipeline Williams/Horowitz (Stanford U.) [1986-91]: successfully used in fabricated chips [Stanford ’87] [HAL ’90s] successfully used in fabricated chips [Stanford ’87] [HAL ’90s] Implemented using “ dynamic logic” Processing Block Completion Detector Datain Dataout Stage 1 Stage 2 Stage 3 ack data
5
5 PS0 Pipeline Stage A PS0 stage consists of dynamic gates and a completion detector: Pull-downnetwork “keeper” PC data inputs data outputs Processing Block CompletionDetector ack
6
6 Dual-Rail Completion Detector Combines dual-rail signals Indicates when all bits are valid (or reset) C Done OR bit 0 OR bit 1 OR bit n OR together 2 rails per bit Merge results using “C-element” C-element: if all inputs=1, output 1 if all inputs=1, output 1 if all inputs=0, output 0 if all inputs=0, output 0 else, maintain output value else, maintain output valueC-element: if all inputs=1, output 1 if all inputs=1, output 1 if all inputs=0, output 0 if all inputs=0, output 0 else, maintain output value else, maintain output value
7
7 Precharge Evaluate: another 3 events Complete cycle: 6 events indicates “done” PRECHARGE N: when N+1 completes evaluation PRECHARGE N: when N+1 completes evaluation delete data: after next stage has copied it EVALUATE N: when N+1 completes precharging EVALUATE N: when N+1 completes precharging accept new data: after next stage is emptied PS0 Protocol 1 2 3 4 5 6 evaluates evaluates evaluates indicates “done” precharges 3 Evaluate Precharge: 3 events N N+1 N+2
8
8 PS0 Performance 1 2 3 4 5 6 Cycle Time =
9
9 Summary: PSO Pipelining Datapaths are latch-free: dynamic gates themselves provide implicit latches dynamic gates themselves provide implicit latches +: chip area savings +: extremely low latency Data items kept separate by control stage deletes data: only after next stage has copied it stage deletes data: only after next stage has copied it stage accepts new data: only if next stage is empty stage accepts new data: only if next stage is empty è distinct data items always separated by “spacers” Control is extremely simple: each controller = single wire completion detector directly controls previous stage completion detector directly controls previous stage +: chip area savings +: low control overhead
10
10 Comparison to a Clocked Pipeline How would you design the pipeline if you actually had a clock? 1. Replace handshaking with “magic clocking” each stage gets its own clock each stage gets its own clock successive clocks are slightly skewed successive clocks are slightly skewed essentially, clocked simulation of asynchronous handshaking! – need multiple clock phases! 2. Use a single clock, but insert latches between stages latches are simple, level-sensitive latches are simple, level-sensitive consecutive stages receive complementary clock signals consecutive stages receive complementary clock signals latch Ck Ck’
11
11 Comparison … (contd.) Cycle Times?
12
12 Drawbacks of PSO Pipelining 1. Poor throughput: long cycle time: 6 events per cycle long cycle time: 6 events per cycle data “tokens” are forced far apart in time data “tokens” are forced far apart in time 2. Limited storage capacity: max only 50% of stages can hold distinct tokens max only 50% of stages can hold distinct tokens data tokens must be separated by at least one spacer data tokens must be separated by at least one spacer Our Research Goals: address both issues still maintain very low latency still maintain very low latency
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.