Download presentation
Presentation is loading. Please wait.
Published bySilvester Malone Modified over 9 years ago
1
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 332 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part VIII: Timing Issues
2
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 333 References and Copyright Textbooks referenced [Rab96] J. M. Rabaey “Digital Integrated Circuits: A Design Perspective ” Prentice Hall, 1996. Slides used(Modified by Kia when necessary) [©Prentice Hall] © Prentice Hall 1995, © UCB 1996 Slides for [Rab96] http://bwrc.eecs.berkeley.edu/Classes/IcBook/instructors.html http://bwrc.eecs.berkeley.edu/Classes/IcBook/instructors.html
3
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 334 Why Deal With Timing? Clock Makes sure signals are settled before being written Controls the order of operations Problem? Physical implementation of the circuit what we planned Why? oWires incur delay on signals oClock edge might arrive too early or too late Challenges Clock routing Synchronization protocols
4
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 335 Clock Skew Clock signal Connects to all registers/flip-flops Connects to all pre-charge/evaluate of dynamic logic Huge fanout large capacitive load Routed to all parts of the chip Huge capacitance of the clock net itself Example: Alpha processor: 3.24 nF (40% chip C) Clock skew Clock net has huge RC Signal arrival time depends on the length of the dest from source Not the “same” clock signal for different destinations Why important? Timing violated Larger chips even worse
5
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 336 Clock Wire Delay CLCL r c Rs r = 0.07 /l c=0.04 fF/ m 2 (Tungsten wire) [©Prentice Hall]
6
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 337 Reference Circuit: Pipelined Datapath We use this circuit to analyze the problem CL1R1CL2R2CL3R3 t’t’ t ’’ t ’’’ In Out titi t l,min t l,max t r,min t r,max Skew: = t ’’ – t ’
7
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 338 Skew in Single-Phase Edge-Triggered Clocking Race between clock and data R1R2 t’t’ t ’’ = t ’ + ’’ ’’ t r,min +t l,min +t i t r,min +t l,min +t i (skew bound) [Rab96] p513
8
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 339 Skew in Single-Phase Edge-Triggered Clocking Data stable before clock applied R1R2 t’t’ t ’’ +T= t ’ + ’’ ’’ t r,max +t l,max +t i T t r,max +t l,max +t i - ’’+T t ’’ + T t ’ +t r,max +t l,max +t i (clock period bound) [Rab96] p513
9
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 340 Clock Signal Direction Same direction as data: >0 Skew constraint (bound) must be strictly controlled - : If constraint not met, even reducing clock frequency would not help! + : Positive skew increases throughput (by ) (see “clock period bound”) oNot worth: high risk Opposite direction as data: <0 Skew constraint always met Throughput decreases (by | |)
10
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 341 Skew in Two-Phase Master-Slave Clocking CL1M1CL2M2CL3M3 ’’ In S1S2S3 ’’
11
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 342 Two-Phase Clock Timing clock period T T1T1 -T 12 clock overlap 11 22 1’ T2T2 T 12 T 21 t min > – T 12 t max < T + – T 12 t min > – T 12 t max < T + – T 12 new data applied to CL2 previous data latched into M2 [©Prentice Hall]
12
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 343 Two-Phase vs. Single-Phase Comparing the skew bounds, T 12 acts as a buffer for the skew Skew can always be countered by increasing T 12 Performance Increasing T 12 could mean longer clock periods Positive vs. negative skew Same as single-phase
13
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 344 How to Counter Clock Skew Problems? Routing the clock in the opposite direction of data Local solution only, not always an option (see below) Controlling the non-overlap periods of the clock Only for 2-phase clocks Could decrease clock frequency Perform the routing of the clock such that skew is minimum ... log Out In Positive Skew Negative Skew Reg [©Prentice Hall]
14
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 345 Clock Routing CLOCK H-Tree Network Observe: Only Relative Skew is Important CLOCK Main clock driver Secondary clock drivers Reduces absolute delay. Makes Power-Down easier Sensitive to variations in Buffer Delay Local Area module Comb-Tree Network [©Prentice Hall]
15
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 346 Example: DEC Alpha 21164 Clock frequency: 300MHz – 9.3 million transistors Total clock load: 3.75 nF Power in clock distribution network: 20W (40% of the total!) Uses two-level clock distribution Single 6-stage driver at center Secondary buffers drive left and right side Clock grid in metal3 and metal4 [©Prentice Hall]
16
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 347 DEC Alpha 21164 [©Prentice Hall]
17
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 348 DEC Alpha 21164: Clock Skew [©Prentice Hall]
18
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 349 Self-Timed and Asynchronous Circuits Functions of clock in synchronous designs Act as completion signal (data stable before latched) Ensures correct ordering of events Based on worst-case delay of the circuit Truly asynchronous design Completion is ensured by careful timing analysis Ordering of events is implicit in logic Very risky Self-timed design Completion ensured completion signal Ordering imposed by handshaking protocol “Local” solution to the timing problem Based on average delay of the circuit [©Prentice Hall]
19
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 350 done start Req Ack Example of Self-Timed Pipeline (Handshaking) “Start” and “done” signals ensure physical timing constraints met Acknowledge/Request (aka handshaking protocol) ensure correct ordering of the operations CL1R1CL2R2CL3R3 In t CL1 t CL2 t CL3 start HS Req HS Req HS Req Ack done
20
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 351 Self-Timed Circuits: Advantages and Disadv. Advantages to synchronous: Timing signals generated locally oNo clock routing problems oSaving in power consumption of the clock net Potential increase in performance oSeparate physical and logical ordering mechanism oSelf-timed: average, synchronous: worst-case Robust to variations (manufacturing + environment) Disadvantage: Larger area oRedundancy oControl circuit (handshaking)
21
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 352 Completion Signal Generation Methods Delay module method Mimic the delay of the logic circuit using a separate delay element. Not much area overhead Not aggressive in obtaining average speed Used in memories (internal timing) Dual-rail computation Use redundant signal representation Denote 1, 0, “in transition” Logic Network In Delay Module start done out BB0B1 In transition00 001 110 Illegal11
22
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 353 Completion Signal Generation: Redundant Code Start B0 B1 Done Vdd B1 B0 In1 In2 PDN [©Prentice Hall]
23
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 354 Redundant Signal Coding (cont.) When “start” is low Circuit precharged (B0,B1) in the “transition” state When “start” high ONLY ONE of the pull-down networks evaluates Only one of the B0, B1 signals goes high “Done” defined as the OR For an N-bit word, all “done” signals must be combined more area, more delay
24
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 355 Example: Self-Timed Adder [©Prentice Hall]
25
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 356 Example: Self-Timed Adder (cont.) Dual evaluation network used only for the carry chain (critical path) Using K (kill) instead of G (generate), inverts the function “Done” evaluation assumed to be slower than sum evaluation Example: Self-timed: 0.23 nsec/bit, 3300 2. Synchronous: same delay, less area BUT, actual performance of self-timed substantially better (average vs. worst-case delays) Self-timed: O(log N) delay – similar to tree-structured synchronous
26
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 357 Handshaking Protocol for the logical ordering of operations Avoid race Avoid hazards Extra hardware to implement State machine Queues possible Exact protocol depends on: Architecture Environment Must accommodate: oNew data available (sender) oRequest computation (sender) oAcknowledge receipt (receiver) oReady for new computation (receiver)
27
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 358 Four-Phase Handshaking Sender-receiver configuration Timing diagram Sender Receiver Req Ack Data Req Data Ack Cycle 1 Cycle 2 Sender’s action Receiver’s action [©Prentice Hall]
28
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 359 Event Logic: the Muller C-element [©Prentice Hall] A B F AB F n+1 0 0 1 1 0 1 0 1 0 F n F n 1 (a) Schematic(b) Truth table V DD F A B Q S R A B F Static Dynamic C
29
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 360 Two-Phase Handshaking Implementation [©Prentice Hall] Sender Logic Receiver Logic Data C Accepted Req Ack Data Ready Implementation Sender’s action Receiver’s action Req Ack Data cycle 1cycle 2 Timing diagram “edge-sensitive” to HS signals 0 Data Ready (DR)=1 1 1 Receiver: “ready for new data” (Ack ) 2 2 Sender: “new data ready” (DR ) Req 3 3 Receiver: “done, ready for new data” (Ack )
30
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 361 Example: Self-Timed FIFO [©Prentice Hall] Req i En 1 Done 1 En 2 Done 2 En 3 Req o Ack i Req i En 1 Done 1 En 2 Done 2 Ack i Req i En 1 Done 1 Ack i Req i C C R1 In Out En Ack i Req i R2R3 C 0 Ack o Done
31
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 362 Asynchronous Systems Outside world usually asynchronous Synchronization usually by polling Perfect synchronization impossible Sample input at transition f f in Asynchronous System Synchronous System Synchro- nization [©Prentice Hall]
32
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 363 A Simple Synchronizer
33
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 364 System Level Synchronization [©Prentice Hall] Reference clock PC board Chip 1Chip 2 Logic I/O Data 1 ’ 2 ’ 1 “ 2 “ Crystal-based clock-generator C l o c k G e n e r a t o r C l o c k G e n e r a t o r
34
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 365 Skew of Local Clocks vs Reference [©Prentice Hall]
35
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 366 Phase-Locked Loop Based Clock Generator [©Prentice Hall] Phase detector Charge pump Up Down Loop filter VCO Clock decode & buffer Divide by N Reference clock Local clock 1 2... V contr Acts also as Clock Multiplier Up Down
36
Spring 2006EE 5324 - VLSI Design II - © Kia Bazargan 367 To Probe Further... Clock skew visualization (cool animations!!) P. J. Restle, "Technical Visualizations in VLSI Design", Design Automation Conference, pp. 494-499, 2001 Asynchronous FIFO design (system-level comm) T. Chelcea and S. Nowick, “Robust Interfaces for MixedTiming Systems with Application to LatencyInsensitive Protocols”, Design Automation Conference, pp. 21-26, 2001.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.