Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.

Similar presentations


Presentation on theme: "Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced."— Presentation transcript:

1 Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory University of California Davis Presentation available at:

2 Prof. V.G. Oklobdzija, University of California
Future Directions Synchronous / Asynchronous paradigm Synchronous solutions: Clock uncertainty absorption Time borrowing Skew-Tolerant Domino Clocking with signals Using both edges of the clock Conclusion 11/27/2018 Prof. V.G. Oklobdzija, University of California

3 Multi-GHz Clocking Problems
Fewer logic in-between pipeline stages: Out of 7-10 FO4 allocated delays, FF can take 2-4 FO4 Clock uncertainty can take another FO4 The total could be ½ of the time allowed for computation 11/27/2018 Prof. V.G. Oklobdzija, University of California

4 Consequences of multi-GHz Clocks
Pipeline boundaries start to blur Clocked Storage Elements must include logic Wave pipelining, domino style, signals used to clock ….. Synchronous design only in a limited domain Asynchronous communication between synchronous domains 11/27/2018 Prof. V.G. Oklobdzija, University of California

5 Synchronous / Asynchronous Design on the Chip
1 Billion transistors on the chip by 64-b, 4-way issue logic core requires ~4 Million 11/27/2018 Prof. V.G. Oklobdzija, University of California

6 Synchronous / Asynchronous Design on the Chip
10 million transistors 1 Billion Transistos Chip 11/27/2018 Prof. V.G. Oklobdzija, University of California

7 Prof. V.G. Oklobdzija, University of California
Two views of the world: - Asynchronous - Synchronous 11/27/2018 Prof. V.G. Oklobdzija, University of California

8 Prof. V.G. Oklobdzija, University of California
11/27/2018 Prof. V.G. Oklobdzija, University of California

9 Asynchronous Paradigm
Logic Stage can take any time it needs Max. Speed limited by Handshake overhead Increased complexity of logic (de-glitching) 11/27/2018 Prof. V.G. Oklobdzija, University of California

10 Prof. V.G. Oklobdzija, University of California
Synchronous Paradigm Max Speed determined by the slowest logic block Latch / FF timing overhead Fixed clock frequency (set by longest path) 11/27/2018 Prof. V.G. Oklobdzija, University of California

11 Synchronous Paradigm Clocked Storage Elements: Flip-Flops and Latches should be viewed as synchronization elements, not merely as storage elements ! Their main purpose is to synchronize fast and slow paths: prevent the fast path from corrupting the state Fast path corrupting present state 11/27/2018 Prof. V.G. Oklobdzija, University of California

12 Synchronous World: Tricks and Solutions
Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Incorporation of Synchronization features into the logic Skew Tolerant Domino Next Utilizing both edges of the Clock 11/27/2018 Prof. V.G. Oklobdzija, University of California

13 Clocked Storage Element Overhead
Q Logic D Q N Clk Clk T TClk-Q TLogic U TD-Q=TClk-Q + U Tskew The time taken from the pipeline by the CSE is U and Clk-Q delay. Thus, D-Q delay is relevant, not Clk-Q : T = TClk-Q + TLogic + U+ Tskew 11/27/2018 Prof. V.G. Oklobdzija, University of California

14 Timing Characteristics
Figure presenting typical clock-to-output and data-to-output characteristics is shown.. In stable region, clock-to-output characteristic is constant. As setup requirement of the device starts to be violated, clock-to-output curve rises, ending in failure at some point. Data-to-output characteristic, being simple sum of clock-to-output and data-to-clock time, falls with the slope of 45° in stable region. In metastable region, the slope starts to decrease as a function of increased clock-to-output characteristic. Minimum of data-to-output curve occurs at 45 ° slope of clock-to-output curve. Data-to-clock time that corresponds to this point is termed optimal setup time. 11/27/2018 Prof. V.G. Oklobdzija, University of California

15 Clock Uncertainty Absorption
11/27/2018 Prof. V.G. Oklobdzija, University of California

16 Clock Uncertainty Absrobtion
Worst-case D DQ Nominal D D-Clk D Clock uncertainty t CU Early D D-Clk Late D D-Clk T =0 Nominal Clk Q D DQm D DQM 11/27/2018 Prof. V.G. Oklobdzija, University of California

17 Clock Uncertainty Absorption
=30ps t =100ps CU CU Clk Clk U =-5ps Opt D D 3ps 44ps U =30ps Q Q Opt D =220ps D =261ps DQM DQM ( a ) t =30ps ( a =90% ) (b) t =100ps ( a =56% ) CU CU CU CU 11/27/2018 Prof. V.G. Oklobdzija, University of California

18 Synchronous World: Tricks and Solutions
Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Incorporation of Synchronization features into the logic Skew Tolerant Domino Next Utilizing both edges of the Clock 11/27/2018 Prof. V.G. Oklobdzija, University of California

19 Prof. V.G. Oklobdzija, University of California
11/27/2018 Prof. V.G. Oklobdzija, University of California

20 Critical Path with Time Borrowing
11/27/2018 Prof. V.G. Oklobdzija, University of California

21 Latches as synchronizers
The purpose of CSE it is to synchronize data flow. We need to insert CSE to prevent “fast paths” from reaching the next logic stage too early. If the signal arrives late – it is allowed to borrow time from the next stage However, borrowing can not go for ever ….. 11/27/2018 Prof. V.G. Oklobdzija, University of California

22 Using Single Pulsed Latch
11/27/2018 Prof. V.G. Oklobdzija, University of California

23 Prof. V.G. Oklobdzija, University of California
Single Pulsed Latch *Courtesy of D. Markovic & Intel MRL 11/27/2018 Prof. V.G. Oklobdzija, University of California

24 Optimal Single Latch Clocking
Single Latch System (Unger & Tan ‘83): Pm=P ≥ DLM+DDQM {miminal clock period} DLm>DLmB≥W+TT+TL+H-DCQm {shortest path} Wopt=TL+TT+U+DCQM-DDQM {minimal clock width} Example: 0.10m Technology FO4=25-40pS, FF=80pS, Tunc=25-35pS, fmax= GHz, T= pS Wopt~2Tunc~50-70pS DLm~4Tunc+H-DCQm~ pS {this is close to ½ of a cycle} 11/27/2018 Prof. V.G. Oklobdzija, University of California

25 Synchronous World: Tricks and Solutions
Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Incorporation of Synchronization features into the logic Skew Tolerant Domino Next Utilizing both edges of the Clock 11/27/2018 Prof. V.G. Oklobdzija, University of California

26 Prof. V.G. Oklobdzija, University of California
Skew-Tolerant Domino (a.k.a. Opportunistic Time Borrowing) Intel Patent No.5,517,136 May 14, 1996 11/27/2018 Prof. V.G. Oklobdzija, University of California

27 CMOS Domino as Memory Element
After the input changes – output remembers it Pre-charge destroys the information Proper phasing of the clock can allow passing the information from stage to stage 11/27/2018 Prof. V.G. Oklobdzija, University of California

28 Prof. V.G. Oklobdzija, University of California
Skew-Tolerant Domino 11/27/2018 Prof. V.G. Oklobdzija, University of California

29 Synchronous World: Tricks and Solutions
Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Incorporation of Synchronization features into the logic Skew Tolerant Domino Next Utilizing both edges of the Clock 11/27/2018 Prof. V.G. Oklobdzija, University of California

30 Data Used to Clock: Non-Clocked Dynamic Logic (NCD)
Logic gate precharged by fast input - no clock Practical for AND logic function OR is also possible Courtesy of N. Nedovic 11/27/2018 Prof. V.G. Oklobdzija, University of California

31 Differential Non-Clocked Dynamic Logic
Can implement any function Precharge inputs are chosen from fastest arriving inputs Courtesy of N. Nedovic 11/27/2018 Prof. V.G. Oklobdzija, University of California

32 Pipeline Flow with NCD Logic - Example
Domino gates used as synchronizers Gates are non-clocked => reducing clock load Precharge ripples through logic in path, similar to evaluation Evaluation may exceed pipeline boundary A method needed to prevent fast precharge Courtesy of N. Nedovic 11/27/2018 Prof. V.G. Oklobdzija, University of California

33 Synchronous World: Tricks and Solutions
Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Incorporation of Synchronization features into the logic Skew Tolerant Domino Next Utilizing both edges of the Clock 11/27/2018 Prof. V.G. Oklobdzija, University of California

34 Dual-Edge Triggered CSE
DET-CSE samples the input data on both edges of the clock Reducing power consumption Half of the original clock frequency for the same data throughput Half of clock generation/distribution/SE-clock-related power is saved However, it may introduce an overhead 11/27/2018 Prof. V.G. Oklobdzija, University of California

35 Dual-Edge Triggered Storage Element Topologies
Structurally, there are two different designs Latch-Mux (LM) Flip-Flop (FF) DET-Flip-Flop Non-transparency achieved by MUX DET-Latch 11/27/2018 Prof. V.G. Oklobdzija, University of California

36 Comparison with Single Edge SEs
11/27/2018 Prof. V.G. Oklobdzija, University of California

37 Comparison with Single Edge CSEs
11/27/2018 Prof. V.G. Oklobdzija, University of California

38 Single and Double Edge Triggered SE: Power Consumption (a=50%)
11/27/2018 Prof. V.G. Oklobdzija, University of California

39 Prof. V.G. Oklobdzija, University of California
11/27/2018 Prof. V.G. Oklobdzija, University of California

40 Prof. V.G. Oklobdzija, University of California
Conclusion Synchronous Design: Has not exhausted all the tricks Asynchronous Design: Has not solved all the problems 11/27/2018 Prof. V.G. Oklobdzija, University of California

41 Design & optimization tradeoffs
Opposite Goals Minimal Total power consumption Minimal Delay Power-Delay tradeoff Minimize Power-Delay product f=const. Opt. Opt. Opt. 11/27/2018 Prof. V.G. Oklobdzija, University of California


Download ppt "Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced."

Similar presentations


Ads by Google