Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 332 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part VIII: Timing Issues.

Similar presentations


Presentation on theme: "Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 332 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part VIII: Timing Issues."— Presentation transcript:

1 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 332 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part VIII: Timing Issues

2 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 333 References and Copyright Textbooks referenced  [Rab96] J. M. Rabaey “Digital Integrated Circuits: A Design Perspective ” Prentice Hall, 1996. Slides used(Modified by Kia when necessary)  [©Prentice Hall] © Prentice Hall 1995, © UCB 1996 Slides for [Rab96] http://bwrc.eecs.berkeley.edu/Classes/IcBook/instructors.html http://bwrc.eecs.berkeley.edu/Classes/IcBook/instructors.html

3 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 334 Why Deal With Timing? Clock  Makes sure signals are settled before being written  Controls the order of operations Problem?  Physical implementation of the circuit  what we planned  Why? oWires incur delay on signals oClock edge might arrive too early or too late Challenges  Clock routing  Synchronization protocols

4 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 335 Clock Skew Clock signal  Connects to all registers/flip-flops  Connects to all pre-charge/evaluate of dynamic logic  Huge fanout  large capacitive load  Routed to all parts of the chip  Huge capacitance of the clock net itself  Example: Alpha  processor: 3.24 nF (40% chip C) Clock skew  Clock net has huge RC  Signal arrival time depends on the length of the dest from source  Not the “same” clock signal for different destinations Why important?  Timing violated  Larger chips  even worse

5 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 336 Clock Wire Delay CLCL r c Rs r = 0.07  /l c=0.04 fF/  m 2 (Tungsten wire) [©Prentice Hall]

6 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 337 Reference Circuit: Pipelined Datapath We use this circuit to analyze the problem CL1R1CL2R2CL3R3 t’t’ t  ’’ t  ’’’  In Out titi t l,min t l,max t r,min t r,max Skew:  = t  ’’ – t  ’

7 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 338 Skew in Single-Phase Edge-Triggered Clocking Race between clock and data R1R2 t’t’ t  ’’ = t  ’ +  ’’  ’’  t r,min +t l,min +t i   t r,min +t l,min +t i (skew bound) [Rab96] p513

8 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 339 Skew in Single-Phase Edge-Triggered Clocking Data stable before clock applied R1R2 t’t’ t  ’’ +T= t  ’ +  ’’  ’’  t r,max +t l,max +t i T  t r,max +t l,max +t i -   ’’+T t  ’’ + T  t  ’ +t r,max +t l,max +t i  (clock period bound) [Rab96] p513

9 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 340 Clock Signal Direction Same direction as data:  >0  Skew constraint (bound) must be strictly controlled  - : If constraint not met, even reducing clock frequency would not help!  + : Positive skew increases throughput (by  ) (see “clock period bound”) oNot worth: high risk Opposite direction as data:  <0  Skew constraint always met  Throughput decreases (by |  |)

10 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 341 Skew in Two-Phase Master-Slave Clocking CL1M1CL2M2CL3M3 ’’  In S1S2S3 ’’  

11 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 342 Two-Phase Clock Timing clock period T T1T1  -T  12 clock overlap 11 22  1’  T2T2 T  12 T  21 t min >  – T  12 t max < T +  – T  12 t min >  – T  12 t max < T +  – T  12 new data applied to CL2 previous data latched into M2 [©Prentice Hall]

12 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 343 Two-Phase vs. Single-Phase Comparing the skew bounds,  T  12 acts as a buffer for the skew  Skew can always be countered by increasing T  12 Performance  Increasing T  12 could mean longer clock periods Positive vs. negative skew  Same as single-phase

13 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 344 How to Counter Clock Skew Problems? Routing the clock in the opposite direction of data  Local solution only, not always an option (see below) Controlling the non-overlap periods of the clock  Only for 2-phase clocks  Could decrease clock frequency Perform the routing of the clock such that skew is minimum   ...  log Out In Positive Skew Negative Skew Reg [©Prentice Hall]

14 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 345 Clock Routing CLOCK H-Tree Network Observe: Only Relative Skew is Important CLOCK Main clock driver Secondary clock drivers Reduces absolute delay. Makes Power-Down easier Sensitive to variations in Buffer Delay Local Area module Comb-Tree Network [©Prentice Hall]

15 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 346 Example: DEC Alpha 21164 Clock frequency: 300MHz – 9.3 million transistors Total clock load: 3.75 nF Power in clock distribution network: 20W (40% of the total!) Uses two-level clock distribution  Single 6-stage driver at center  Secondary buffers drive left and right side Clock grid in metal3 and metal4 [©Prentice Hall]

16 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 347 DEC Alpha 21164 [©Prentice Hall]

17 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 348 DEC Alpha 21164: Clock Skew [©Prentice Hall]

18 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 349 Self-Timed and Asynchronous Circuits Functions of clock in synchronous designs  Act as completion signal (data stable before latched)  Ensures correct ordering of events  Based on worst-case delay of the circuit Truly asynchronous design  Completion is ensured by careful timing analysis  Ordering of events is implicit in logic  Very risky Self-timed design  Completion ensured completion signal  Ordering imposed by handshaking protocol  “Local” solution to the timing problem  Based on average delay of the circuit [©Prentice Hall]

19 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 350 done start Req Ack Example of Self-Timed Pipeline (Handshaking) “Start” and “done” signals ensure physical timing constraints met Acknowledge/Request (aka handshaking protocol) ensure correct ordering of the operations CL1R1CL2R2CL3R3 In t CL1 t CL2 t CL3 start HS Req HS Req HS Req Ack done

20 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 351 Self-Timed Circuits: Advantages and Disadv. Advantages to synchronous:  Timing signals generated locally oNo clock routing problems oSaving in power consumption of the clock net  Potential increase in performance oSeparate physical and logical ordering mechanism oSelf-timed: average, synchronous: worst-case  Robust to variations (manufacturing + environment) Disadvantage:  Larger area oRedundancy oControl circuit (handshaking)

21 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 352 Completion Signal Generation Methods Delay module method  Mimic the delay of the logic circuit using a separate delay element.  Not much area overhead  Not aggressive in obtaining average speed  Used in memories (internal timing) Dual-rail computation  Use redundant signal representation  Denote 1, 0, “in transition” Logic Network In Delay Module start done out BB0B1 In transition00 001 110 Illegal11

22 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 353 Completion Signal Generation: Redundant Code Start B0 B1 Done Vdd B1 B0 In1 In2 PDN [©Prentice Hall]

23 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 354 Redundant Signal Coding (cont.) When “start” is low  Circuit precharged  (B0,B1) in the “transition” state When “start” high  ONLY ONE of the pull-down networks evaluates  Only one of the B0, B1 signals goes high “Done” defined as the OR  For an N-bit word, all “done” signals must be combined  more area, more delay

24 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 355 Example: Self-Timed Adder [©Prentice Hall]

25 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 356 Example: Self-Timed Adder (cont.) Dual evaluation network used only for the carry chain (critical path) Using K (kill) instead of G (generate), inverts the function “Done” evaluation assumed to be slower than sum evaluation Example:  Self-timed: 0.23 nsec/bit, 3300 2.  Synchronous: same delay, less area  BUT, actual performance of self-timed substantially better (average vs. worst-case delays)  Self-timed: O(log N) delay – similar to tree-structured synchronous

26 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 357 Handshaking Protocol for the logical ordering of operations  Avoid race  Avoid hazards Extra hardware to implement  State machine  Queues possible Exact protocol depends on:  Architecture  Environment  Must accommodate: oNew data available (sender) oRequest computation (sender) oAcknowledge receipt (receiver) oReady for new computation (receiver)

27 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 358 Four-Phase Handshaking Sender-receiver configuration Timing diagram Sender Receiver Req Ack Data Req Data Ack Cycle 1 Cycle 2 Sender’s action Receiver’s action [©Prentice Hall]

28 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 359 Event Logic: the Muller C-element [©Prentice Hall] A B F AB F n+1 0 0 1 1 0 1 0 1 0 F n F n 1 (a) Schematic(b) Truth table V DD F A B Q S R A B F Static Dynamic C

29 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 360 Two-Phase Handshaking Implementation [©Prentice Hall] Sender Logic Receiver Logic Data C Accepted Req Ack Data Ready Implementation Sender’s action Receiver’s action Req Ack Data cycle 1cycle 2 Timing diagram “edge-sensitive” to HS signals 0 Data Ready (DR)=1 1 1 Receiver: “ready for new data” (Ack  ) 2 2 Sender: “new data ready” (DR  )  Req  3 3 Receiver: “done, ready for new data” (Ack  )

30 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 361 Example: Self-Timed FIFO [©Prentice Hall] Req i   En 1   Done 1   En 2   Done 2   En 3   Req o   Ack i   Req i   En 1   Done 1   En 2   Done 2   Ack i   Req i   En 1   Done 1   Ack i   Req i  C C R1 In Out En Ack i Req i R2R3 C 0 Ack o Done

31 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 362 Asynchronous Systems Outside world usually asynchronous Synchronization usually by polling Perfect synchronization impossible  Sample input at transition f  f in Asynchronous System Synchronous System Synchro- nization [©Prentice Hall]

32 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 363 A Simple Synchronizer

33 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 364 System Level Synchronization [©Prentice Hall] Reference clock  PC board Chip 1Chip 2 Logic I/O Data  1 ’  2 ’  1 “  2 “ Crystal-based clock-generator C l o c k G e n e r a t o r C l o c k G e n e r a t o r

34 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 365 Skew of Local Clocks vs Reference [©Prentice Hall]

35 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 366 Phase-Locked Loop Based Clock Generator [©Prentice Hall] Phase detector Charge pump Up Down Loop filter VCO Clock decode & buffer Divide by N Reference clock Local clock  1  2... V contr Acts also as Clock Multiplier Up Down

36 Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 367 To Probe Further... Clock skew visualization (cool animations!!)  P. J. Restle, "Technical Visualizations in VLSI Design", Design Automation Conference, pp. 494-499, 2001 Asynchronous FIFO design (system-level comm)  T. Chelcea and S. Nowick, “Robust Interfaces for MixedTiming Systems with Application to LatencyInsensitive Protocols”, Design Automation Conference, pp. 21-26, 2001.


Download ppt "Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 332 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part VIII: Timing Issues."

Similar presentations


Ads by Google