1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.

1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams

2 Review: Logic Gate Families  Static CMOS logic (“standard”)  Transmission gates, or “pass-transistor” logic  Dynamic logic, or “domino” logic

3 Static CMOS logic: Summary Advantages: output always strongly driven output always strongly driven  pull-up and pull-down networks are fully-complementary; always exactly one of them is “on”  good immunity from noise and leakage both inverting and non-inverting functions implementable both inverting and non-inverting functions implementable  each gate is inverting  cascade two gates together to get non-inverting logic Disadvantages: slow/big PMOS devices needed (in addition to NMOS) slow/big PMOS devices needed (in addition to NMOS)  greater chip area  higher power consumption  slower switching speed

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 4 Complementary CMOS  Complementary CMOS logic gates –nMOS pull-down network –pMOS pull-up network –a.k.a. static CMOS Pull-up OFFPull-up ON Pull-down OFFZ (float)1 Pull-down ON0X (crowbar)

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 5 Series and Parallel  nMOS: 1 = ON  pMOS: 0 = ON  Series: both must be ON  Parallel: either can be ON

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 6 CMOS Gate Design  Activity: –Sketch a 4-input CMOS NOR gate

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 7 CMOS Gate Design  Activity: –Sketch a 4-input CMOS NAND gate

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 8 Conduction Complement  Complementary CMOS gates always produce 0 or 1  Ex: NAND gate –Series nMOS: Y=0 when both inputs are 1 –Thus Y=1 when either input is 0 –Requires parallel pMOS  Rule of Conduction Complements –Pull-up network is complement of pull-down –Parallel -> series, series -> parallel

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 9 Compound Gates  Compound gates can do any inverting function  Ex:

10 Transmission (“Pass”) Gates Key Idea: transistors used in a different configuration transistors used in a different configuration when switched on: instead of connecting output to Vdd or Gnd, they connect output to the input when switched on: instead of connecting output to Vdd or Gnd, they connect output to the inputAdvantage: very efficient for implementing switches and multiplexers very efficient for implementing switches and multiplexersDisadvantage: signal degradation unless both NFET and PFET passgates are used in a complementary configuration signal degradation unless both NFET and PFET passgates are used in a complementary configuration

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 11 Pass Transistors  Transistors can be used as switches

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 12 Pass Transistors  Transistors can be used as switches

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 13 Transmission Gates  Single pass transistors produce degraded outputs –pMOS good only for transmitting “1” –nMOS good only for transmitting “0”

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 14 Transmission Gates  Single pass transistors produce degraded outputs  Complementary Transmission gates pass both 0 and 1 well

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 15 Multiplexers  2:1 multiplexer chooses between two inputs SD1D0Y 0X00 0X11 10X0 11X1

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 16 Transmission Gate Mux  Nonrestoring mux uses two transmission gates –Only 4 transistors

Credit: David Harris, Harvey Mudd College OPTIONAL MATERIAL 17 Gate-Level Mux Design   How many transistors are needed? 20

18 Dynamic Logic, or “domino” Key idea: only use NMOS’s to compute function only use NMOS’s to compute function use a single PMOS to reset use a single PMOS to resetAdvantages: significantly fewer transistors  smaller chip area significantly fewer transistors  smaller chip area higher speed, lower power higher speed, lower power  less “loading” on wires (drive fewer transistors) for async: no storage elements needed for async: no storage elements neededDisadvantages: need extra control input to precharge need extra control input to precharge logic is typically non-inverting only logic is typically non-inverting only more vulnerable to noise and leakage effects more vulnerable to noise and leakage effects

19 Dynamic Logic, or “domino” (contd.) Gate has 2 phases: precharge (=reset): output reset to ‘0’ precharge (=reset): output reset to ‘0’ evaluate: output computed  either stays ‘0’, or switches to ‘1’ evaluate: output computed  either stays ‘0’, or switches to ‘1’ Pull-up and pull-down must never both be simultaneously active: ensure that data inputs are reset while gate is precharging ensure that data inputs are reset while gate is precharging or, add a “footer” device or, add a “footer” device pull-downnetwork controls “evaluation” controls “precharge” PC data inputs control input data output pull-up network PC =0 ( asserted )  precharge PC =0 ( asserted )  precharge PC =1 ( de-asserted )  evaluate PC =1 ( de-asserted )  evaluate

20 Outline: Several Pipeline Styles  Classic static logic pipeline: Sutherland  Recent static logic pipeline: MOUSETRAP  Classic dynamic logic pipeline: Williams/Horowitz’ PS0

21 A Classic Asynchronous Dynamic Pipeline Williams and Horowitz’s PS0 pipeline:  Structure  Operation  Performance

22 A Classic Approach: PS0 Pipeline Williams/Horowitz (Stanford U.) [1986-91]: successfully used in fabricated chips [Stanford ’87] [HAL ’90s] successfully used in fabricated chips [Stanford ’87] [HAL ’90s] Implemented using “ dynamic logic” Processing Block Completion Detector Datain Dataout Stage 1 Stage 2 Stage 3 ack data

23 PS0 Pipeline Stage A PS0 stage consists of dynamic gates and a completion detector: Pull-downnetwork “keeper” PC data inputs data outputs Processing Block CompletionDetector ack

24 Dual-Rail Completion Detector  Combines dual-rail signals  Indicates when all bits are valid (or reset) C Done OR bit 0 OR bit 1 OR bit n  OR together 2 rails per bit  Merge results using “C-element” C-element: if all inputs=1, output  1 if all inputs=1, output  1 if all inputs=0, output  0 if all inputs=0, output  0 else, maintain output value else, maintain output valueC-element: if all inputs=1, output  1 if all inputs=1, output  1 if all inputs=0, output  0 if all inputs=0, output  0 else, maintain output value else, maintain output value

25 Precharge  Evaluate: another 3 events Complete cycle: 6 events indicates “done” PRECHARGE N: when N+1 completes evaluation PRECHARGE N: when N+1 completes evaluation  delete data: after next stage has copied it EVALUATE N: when N+1 completes precharging EVALUATE N: when N+1 completes precharging  accept new data: after next stage is emptied PS0 Protocol 1 2 3 4 5 6 evaluates evaluates evaluates indicates “done” precharges 3 Evaluate  Precharge: 3 events N N+1 N+2

26 PS0 Performance 1 2 3 4 5 6 Cycle Time =

27 Summary: PS0 Pipelining Datapaths are latch-free: dynamic gates themselves provide implicit latches dynamic gates themselves provide implicit latches +: chip area savings +: extremely low latency Data items kept separate by control stage deletes data: only after next stage has copied it stage deletes data: only after next stage has copied it stage accepts new data: only if next stage is empty stage accepts new data: only if next stage is empty è distinct data items always separated by “spacers” Control is extremely simple: each controller = single wire completion detector directly controls previous stage completion detector directly controls previous stage +: chip area savings +: low control overhead

28 Comparison to a Clocked Pipeline How would you design the pipeline if you actually had a clock? 1. Replace handshaking with “magic clocking” each stage gets its own clock each stage gets its own clock successive clocks are slightly skewed successive clocks are slightly skewed  essentially, clocked simulation of asynchronous handshaking! – need multiple clock phases! 2. Use a single clock, but insert latches between stages latches are simple, level-sensitive latches are simple, level-sensitive consecutive stages receive complementary clock signals consecutive stages receive complementary clock signals latch Ck Ck’

29 Drawbacks of PS0 Pipelining 1. Poor throughput: long cycle time: 6 events per cycle long cycle time: 6 events per cycle data “tokens” are forced far apart in time data “tokens” are forced far apart in time 2. Limited storage capacity: max only 50% of stages can hold distinct tokens max only 50% of stages can hold distinct tokens data tokens must be separated by at least one spacer data tokens must be separated by at least one spacer My Research Goals have been: address both issues still maintain very low latency still maintain very low latency

30 Homework #4 (due Tue Sep 18) 1. Enumerate ALL of the timing assumptions inherent in Williams’ PS0 style Assume all gate and wire delays can be arbitrary Assume all gate and wire delays can be arbitrary For which scenarios can there be a malfunction? For which scenarios can there be a malfunction? 2. Compare the cycle times of PS0 with an ideal clocked dynamic pipeline (slide #28)

1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.

Similar presentations

Presentation on theme: "1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.

Similar presentations

Presentation on theme: "1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams."— Presentation transcript:

Similar presentations

About project

Feedback