Presentation is loading. Please wait.

Presentation is loading. Please wait.

Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory.

Similar presentations


Presentation on theme: "Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory."— Presentation transcript:

1 Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory University of California Davis Presentation available at: http://www.ece.ucdavis.edu/acsel http://www.ece.ucdavis.edu/acsel

2 November 6, 2003Prof. V.G. Oklobdzija, University of California2 Directions in SoC Clocking Synchronous / Asynchronous paradigm Synchronous / Asynchronous paradigm Synchronous solutions: Synchronous solutions: –Clock uncertainty absorption –Time borrowing –Skew-Tolerant Domino –Using both edges of the clock Conclusion Conclusion

3 November 6, 2003Prof. V.G. Oklobdzija, University of California3 Clock frequency trends ISSCC-2002

4 November 6, 2003Prof. V.G. Oklobdzija, University of California4 Processor Frequency Trends Ê Frequency doubles each generation Ë Number of gates/clock reduce by 25% Courtesy of: Intel, S. Borkar

5 November 6, 2003Prof. V.G. Oklobdzija, University of California5 Multi-GHz Clocking Problems Fewer logic in-between pipeline stages: Fewer logic in-between pipeline stages: –Out of 7-10 FO4 allocated delays, FF can take 2-4 FO4 Clock uncertainty can take another FO4 Clock uncertainty can take another FO4 The total could be ½ of the time allowed for computation The total could be ½ of the time allowed for computation

6 November 6, 2003Prof. V.G. Oklobdzija, University of California6 Clock Uncertainties

7 November 6, 2003Prof. V.G. Oklobdzija, University of California7 Motivation for Improving on Clocked Storage Elements Example: In a 2.0 GHZ processor T=500pS - Typically clocked storage element D-Q delay is in the order of 100-150pS - If one can design a faster CSE: e.g. 80-100pS D-Q, this represents 10-15% performance improvement - If in addition one can absorb 20pS of clock uncertainties and embedd one level of logic – this can yield up to 20% performance improvement - Try to achieve 10-20% performance improvement by introducing new features in the architecture ! - This is sufficient to turn an architect into a circuit designer !

8 November 6, 2003Prof. V.G. Oklobdzija, University of California8 Consequences of multi-GHz Clocks Pipeline boundaries start to blur Pipeline boundaries start to blur Clocked Storage Elements must include logic Clocked Storage Elements must include logic Wave pipelining, domino style, signals used to clock ….. Wave pipelining, domino style, signals used to clock ….. Synchronous design only in a limited domain Synchronous design only in a limited domain Asynchronous communication between synchronous domains Asynchronous communication between synchronous domains

9 November 6, 2003Prof. V.G. Oklobdzija, University of California9 Synchronous / Asynchronous Design on the Chip 1 Billion transistors on the chip by 2005-6 1 Billion transistors on the chip by 2005-6 64-b, 4-way issue logic core requires ~2 Million 64-b, 4-way issue logic core requires ~2 Million Table 1: Transistor count in typical RISC processors Feature Digital 21164 MIPS 10000 Power PC620 HP 8000 Sun US Freq. [MHz]500200 180250 Pipeline Stg.75-757-96-9 Issue Rate44444 Out-of-Ord.6 loads321656none Reg-Ren./flpnone/832/328/856none Total Trans.9.3M5.9M6.9M3.9M3.8M Logic Trans.1.8M2.3M2.2M3.9M2.0M

10 November 6, 2003Prof. V.G. Oklobdzija, University of California10 Synchronous / Asynchronous Design on the Chip 1 Billion Transistors Chip 10 million transistors

11 November 6, 2003Prof. V.G. Oklobdzija, University of California11 Two views of the world: - Asynchronous - Synchronous

12 November 6, 2003Prof. V.G. Oklobdzija, University of California12 Asynchronous Paradigm Logic Stage can take any time it needs Logic Stage can take any time it needs Max. Speed limited by Handshake overhead Max. Speed limited by Handshake overhead Increased complexity of logic (de-glitching) Increased complexity of logic (de-glitching)

13 November 6, 2003Prof. V.G. Oklobdzija, University of California13 Synchronous Paradigm Max Speed determined by the slowest logic block Max Speed determined by the slowest logic block Latch / FF timing overhead Latch / FF timing overhead Fixed clock frequency (set by longest path) Fixed clock frequency (set by longest path)

14 November 6, 2003Prof. V.G. Oklobdzija, University of California14 Synchronous Paradigm Clocked Storage Elements: Flip-Flops and Latches should be viewed as synchronization elements, not merely as storage elements ! Clocked Storage Elements: Flip-Flops and Latches should be viewed as synchronization elements, not merely as storage elements ! Their main purpose is to synchronize fast and slow paths: Their main purpose is to synchronize fast and slow paths: –prevent the fast path from corrupting the state

15 November 6, 2003Prof. V.G. Oklobdzija, University of California15 Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Time Borrowing Incorporation of Synchronization features into the logic Incorporation of Synchronization features into the logic –Skew Tolerant Domino Utilizing both edges of the Clock Utilizing both edges of the Clock

16 November 6, 2003Prof. V.G. Oklobdzija, University of California16 Clocked Storage Element Overhead The time taken from the pipeline by the CSE is U and Clk-Q delay. Thus, D-Q delay is relevant, not Clk-Q : The time taken from the pipeline by the CSE is U and Clk-Q delay. Thus, D-Q delay is relevant, not Clk-Q : T = T Clk-Q + T Logic + U+ T skew N DQ Clk DQ Logic T Logic T Clk-Q U T T D-Q =T Clk-Q + U T skew

17 November 6, 2003Prof. V.G. Oklobdzija, University of California17 Delay vs. Setup/Hold Times Sampling Window 0 50 100 150 200 250 300 350 -200-150-100-50050100150200 Data-Clk [ps] Clk-Output [ps] SetupHold Minimum Data-Output

18 November 6, 2003Prof. V.G. Oklobdzija, University of California18

19 November 6, 2003Prof. V.G. Oklobdzija, University of California19 Clock Uncertainty Absorption

20 November 6, 2003Prof. V.G. Oklobdzija, University of California20 Single-Ended Skew Tolerant Flip-Flop Nedovic, Oklobdzija, Walker, ISSCC 2003

21 November 6, 2003Prof. V.G. Oklobdzija, University of California21 Clock Uncertainty Absrobtion Clock uncertainty t CU D Q Clk Worst-case D DQ Nominal D D-Clk D DQm D DQM Early D D-Clk Late D D-Clk T Nominal =0

22 November 6, 2003Prof. V.G. Oklobdzija, University of California22 Clock Uncertainty Absorption t CU =100ps 44ps U Opt =30ps D DQM =261ps t CU =30ps 3ps U Opt =-5ps D DQM =220ps Clk D Q D Q (b) t CU =100ps(a CU =56%) (a) t CU =30ps(a CU =90%)

23 November 6, 2003Prof. V.G. Oklobdzija, University of California23 Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Time Borrowing Incorporation of Synchronization features into the logic Incorporation of Synchronization features into the logic –Skew Tolerant Domino Utilizing both edges of the Clock Utilizing both edges of the Clock

24 November 6, 2003Prof. V.G. Oklobdzija, University of California24 Time Borrowing

25 November 6, 2003Prof. V.G. Oklobdzija, University of California25

26 November 6, 2003Prof. V.G. Oklobdzija, University of California26 Critical Path with Time Borrowing

27 November 6, 2003Prof. V.G. Oklobdzija, University of California27 Latches as synchronizers The purpose of CSE it is to synchronize data flow. The purpose of CSE it is to synchronize data flow. We need to insert CSE to prevent “fast paths” from reaching the next logic stage too early. We need to insert CSE to prevent “fast paths” from reaching the next logic stage too early. If the signal arrives late – it is allowed to borrow time from the next stage If the signal arrives late – it is allowed to borrow time from the next stage However, borrowing can not go for ever ….. However, borrowing can not go for ever …..

28 November 6, 2003Prof. V.G. Oklobdzija, University of California28 Using Single Pulsed Latch

29 November 6, 2003Prof. V.G. Oklobdzija, University of California29 Single Pulsed Latch *Courtesy of D. Markovic & Intel MRL

30 November 6, 2003Prof. V.G. Oklobdzija, University of California30 Optimal Single Latch Clocking Single Latch System (Unger & Tan ‘83): P m =P ≥ D LM +D DQM {miminal clock period} D Lm >D LmB ≥W+T T +T L +H-D CQm {shortest path} W opt =T L +T T +U+D CQM -D DQM {minimal clock width} Example: 0.10  Technology FO4=25-40pS, FF=80pS, T unc =25-35pS, f max =2.5-4. GHz, T=250- 400pS W opt ~2T unc ~50-70pS D Lm ~4T unc +H-D CQm ~100-140pS {this is close to ½ of a cycle}

31 November 6, 2003Prof. V.G. Oklobdzija, University of California31 Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Time Borrowing Incorporation of Synchronization features into the logic Incorporation of Synchronization features into the logic –Skew Tolerant Domino Utilizing both edges of the Clock Utilizing both edges of the Clock

32 November 6, 2003Prof. V.G. Oklobdzija, University of California32 Skew-Tolerant Domino (a.k.a. Opportunistic Time Borrowing) Intel Patent No.5,517,136 May 14, 1996

33 November 6, 2003Prof. V.G. Oklobdzija, University of California33 CMOS Domino as Memory Element After the input changes – output remembers it After the input changes – output remembers it Pre-charge destroys the information Pre-charge destroys the information Proper phasing of the clock can allow passing the information from stage to stage Proper phasing of the clock can allow passing the information from stage to stage

34 November 6, 2003Prof. V.G. Oklobdzija, University of California34 Skew-Tolerant Domino

35 November 6, 2003Prof. V.G. Oklobdzija, University of California35 Synchronous World: Tricks and Solutions Clocked Storage Elements with clock uncertainty absorption features Clocked Storage Elements with clock uncertainty absorption features Time Borrowing Time Borrowing Incorporation of Synchronization features into the logic Incorporation of Synchronization features into the logic –Skew Tolerant Domino Utilizing both edges of the Clock Utilizing both edges of the Clock

36 November 6, 2003Prof. V.G. Oklobdzija, University of California36 Dual-Edge Triggered CSE DET-CSE samples the input data on both edges of the clock DET-CSE samples the input data on both edges of the clock Reducing power consumption Reducing power consumption –Half of the original clock frequency for the same data throughput –Half of clock generation/distribution/SE- clock-related power is saved However, it may introduce an overhead However, it may introduce an overhead

37 November 6, 2003Prof. V.G. Oklobdzija, University of California37 Dual-Edge Triggered Storage Element Topologies Structurally, there are two different designs Structurally, there are two different designs –Latch-Mux (LM) –Flip-Flop (FF) DET-Flip-Flop DET-Latch Non-transparency achieved by MUX

38 November 6, 2003Prof. V.G. Oklobdzija, University of California38 Comparison with Single Edge SEs

39 November 6, 2003Prof. V.G. Oklobdzija, University of California39 Comparison with Single Edge CSEs

40 November 6, 2003Prof. V.G. Oklobdzija, University of California40 Single and Double Edge Triggered SE: Power Consumption (a=50%)

41 November 6, 2003Prof. V.G. Oklobdzija, University of California41 Fo4=2.9

42 November 6, 2003Prof. V.G. Oklobdzija, University of California42 Symmetric Pulse Generator Flip-Flop (SPG-FF) Nedovic, Oklobdzija, Walker, ESSCIRC 2002

43 November 6, 2003Prof. V.G. Oklobdzija, University of California43Conclusion Clocking is the next challenge. Current clocking techniques may hold up to 10 GHz. Afterwards the pipeline boundaries start to vanish while more exotic clocking techniques will find their use. Synchronous design will be possible only in limited domains on the chip. A mix of Synchronous and Asynchronous design may emerge even in digital logic. Clocking is the next challenge. Current clocking techniques may hold up to 10 GHz. Afterwards the pipeline boundaries start to vanish while more exotic clocking techniques will find their use. Synchronous design will be possible only in limited domains on the chip. A mix of Synchronous and Asynchronous design may emerge even in digital logic. Synchronous Design: Synchronous Design: –Has not exhausted all the tricks Asynchronous Design: Asynchronous Design: –Has not solved all the problems We need solutions from both for a successful SoC Design We need solutions from both for a successful SoC Design


Download ppt "Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory."

Similar presentations


Ads by Google