CSE241 L4 System.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Lecture 04: Static Timing Analysis
CSE241 L4 System.2Kahng & Cichy, UCSD ©2003 All s and announcements are archived on the class web page Remember: tomorrow (Friday, January 17) noon- 1:20pm (EBUI 3327) prepares you for the synthesis lecture on Tuesday (see syllabus on web for expected content) HW questions 4-6 due today Lab #1 report due Monday; turn-in instructions and script announced last night This class: introduction to static timing analysis Goals: How does STA work? What are use models for STA? (What kinds of things are done with STA results?) Logistics
CSE241 L4 System.3Kahng & Cichy, UCSD ©2003 Determine fastest permissible clock speed (e.g. 100MHz) by determining delay (including set-up and hold time) of longest path from register to register (e.g. 10ns) Largely eliminates need for gate-level simulation to verify delay of the circuit Goal of Static Timing Verification clk Combinational logic clk Combinational logic clk Combinational logic Courtesy K. Keutzer et al. UCB
CSE241 L4 System.4Kahng & Cichy, UCSD ©2003 Cycle Time - Critical Path Delay Cycle time (T) cannot be smaller than longest path delay (T max ) Longest (critical) path delay is a function of: Total gate, wire delays l logic levels clock Q1 Q2 T clock1 T clock2 critical path, ~5 logic levels T clock1 data cycle time Courtesy K. Keutzer et al. UCB
CSE241 L4 System.5Kahng & Cichy, UCSD ©2003 Cycle Time - Setup Time For FFs to correctly capture data, must be stable for: Setup time (T setup ) before clock arrives clock Q1 Q2 T clock1 T clock2 critical path, ~5 logic levels T clock1 data setup time Courtesy K. Keutzer et al. UCB
CSE241 L4 System.6Kahng & Cichy, UCSD ©2003 Cycle Time – Clock Skew clock Q1 Q2 T clock1 T clock2 T clock1 T clock2 Q2 data clock skew Q2 6 If clock network has unbalanced delay – clock skew Cycle time is also a function of clock skew (T skew ) critical path, ~5 logic levels Courtesy K. Keutzer et al. UCB
CSE241 L4 System.7Kahng & Cichy, UCSD ©2003 Cycle Time – Flip-Flop Delay (Clock to Q) Cycle time is also a function of propagation delay of FF (T clk-to-Q ) T clk-to-Q : time from arrival of clock signal till change at FF output) clock Q1 Q2 T clock1 T clock2 T clock1 T clock2 Q2 clock-to-Q data Q2 critical path, ~5 logic levels Courtesy K. Keutzer et al. UCB
CSE241 L4 System.8Kahng & Cichy, UCSD ©2003 Min Path Delay - Hold Time For FFs to correctly latch data, data must be stable during: Hold time (T hold ) after clock arrives Determined by delay of shortest path in circuit (T min ) and clock skew (T skew ) clock Q1 Q2 T clock1 T clock2 short path, ~3 logic levels T clock1 data hold time Courtesy K. Keutzer et al. UCB
CSE241 L4 System.9Kahng & Cichy, UCSD ©2003 Setup, Hold, Cycle Times set-up time – D stable before clock cycle time Example of a single phase clock hold time – D stable after clock When signal may change Courtesy K. Keutzer et al. UCB
CSE241 L4 System.10Kahng & Cichy, UCSD ©2003 Approach – Reduce to Combinational clk Combinational logic clk Combinational logic clk Combinational logic original circuit extracted block Combinational logic Courtesy K. Keutzer et al. UCB
CSE241 L4 System.11Kahng & Cichy, UCSD ©2003 Combinational Block Arrival time in green A C B f X Y Z W Interconnect delay in red Gate delay in blue What’s the right mathematical object to use to represent this physical object? Courtesy K. Keutzer et al. UCB
CSE241 L4 System.12Kahng & Cichy, UCSD ©2003 Problem Formulation - 1 C B f X Y W A A C B f X Y Z W Z Use a labeled directed graph G = Vertices represent gates, primary inputs and primary outputs Edges represent wires Labels represent delays HW #7: In this graph model, both nodes and edges have delays. How would you modify this model so that only edges have delay, yet the same timing analysis remains possible? Courtesy K. Keutzer et al. UCB
CSE241 L4 System.13Kahng & Cichy, UCSD ©2003 Problem Formulation – Actual Arrival Time Actual arrival time A(v) for a node v is latest time that signal can arrive at node v X Y A(Z) Z A(X) A(Y) Courtesy K. Keutzer et al. UCB
CSE241 L4 System.14Kahng & Cichy, UCSD ©2003 Problem Formulation - 2 C B f X Y W A A C B f X Y Z W Z Use a labeled directed graph G = Enumerate all paths - choose the longest? Courtesy K. Keutzer et al. UCB
CSE241 L4 System.15Kahng & Cichy, UCSD ©2003 Problem Formulation - 3 C B f X Y W A Z Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Origin (Kirkpatrick 1966, IBM JRD) Courtesy K. Keutzer et al. UCB
CSE241 L4 System.16Kahng & Cichy, UCSD ©2003 Algorithm Execution C B f X Y W A Z Origin 0 Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB
CSE241 L4 System.17Kahng & Cichy, UCSD ©2003 Algorithm Execution C B f X Y W A Z Origin 0 0 Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB
CSE241 L4 System.18Kahng & Cichy, UCSD ©2003 Algorithm Execution C B f X Y W A Z Origin Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB
CSE241 L4 System.19Kahng & Cichy, UCSD ©2003 Algorithm Execution C B f X Y W A Z Origin Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB
CSE241 L4 System.20Kahng & Cichy, UCSD ©2003 Algorithm Execution C B f X Y W A Z Origin 1.1 Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB
CSE241 L4 System.21Kahng & Cichy, UCSD ©2003 Algorithm Execution C B f X Y W A Z Origin Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB
CSE241 L4 System.22Kahng & Cichy, UCSD ©2003 Algorithm Execution C B f X Y W A Z Origin Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB
CSE241 L4 System.23Kahng & Cichy, UCSD ©2003 Algorithm Execution C B f X Y W A Z Origin Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB
CSE241 L4 System.24Kahng & Cichy, UCSD ©2003 Algorithm Execution C B f X Y W A Z Origin Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB
CSE241 L4 System.25Kahng & Cichy, UCSD ©2003 Algorithm Execution C B f X Y W A Z Origin Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB
CSE241 L4 System.26Kahng & Cichy, UCSD ©2003 Algorithm Execution C B f X Y W A Z Origin Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB
CSE241 L4 System.27Kahng & Cichy, UCSD ©2003 Critical Path (sub-graph) C B f X Y W A Z Origin Compute longest path in a graph G = // delay is set of labels, Origin is the super-source of the DAG Forward-prop(W){ for each vertex v in W for each edge from v Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay( )) if all incoming edges of w have been traversed, add w to W } Longest path(G){ Forward_prop(Origin) } Courtesy K. Keutzer et al. UCB
CSE241 L4 System.28Kahng & Cichy, UCSD ©2003 Timing for Optimization: Extra Requirements Longest-path algorithm computes arrival times at each node If we have constraints, need to propagate slack to each node l A measure of how much timing margin exists at each node l Can optimize a particular branch Can trade slack for power, area, robustness clock Courtesy K. Keutzer et al. UCB
CSE241 L4 System.29Kahng & Cichy, UCSD ©2003 Required Arrival Time Required arrival time R(v) is the time before which a signal must arrive to avoid a timing violation Then recursively X Y R(Z) Z R(X) R(Y) Courtesy K. Keutzer et al. UCB
CSE241 L4 System.30Kahng & Cichy, UCSD ©2003 Required Time Propagation: Example C B f X Y W A Z Assume required time at output R(f) = 5.80 Propagate required times backwards 0 O Origin Courtesy K. Keutzer et al. UCB
CSE241 L4 System.31Kahng & Cichy, UCSD ©2003 Timing Slack From arrival and required time can compute slack. For each node v: Slack reflects criticality of a node Positive slack l Node is not on critical path. Timing constraints met. Zero slack l Node is on critical path. Timing constraints are barely met. Negative slack l There is a timing violation Slack distribution is key for timing optimization! Courtesy K. Keutzer et al. UCB
CSE241 L4 System.32Kahng & Cichy, UCSD ©2003 Timing Slack Computation: Example Compute slack at each node C B f X Y W A Z Origin Courtesy K. Keutzer et al. UCB
CSE241 L4 System.33Kahng & Cichy, UCSD ©2003 clk Combinational logic clk Combinational logic clk Combinational logic Computing longest path delay Full path enumeration - potentially exponential Longest path algorithm on DAG (Kirkpatrick 1966, IBM JRD) (O(v+e) or O(g + p)) Currently applied to even the largest (>10M gate) circuits Two challenges Asynchronous subcircuits False paths Approach of Static Timing Verification Courtesy K. Keutzer et al. UCB
CSE241 L4 System.34Kahng & Cichy, UCSD ©2003 Enhancements of STA Incremental timing analysis Nanometer-scale process effects – variation ( probabilistic timing analysis) Interference – crosstalk Multiple inputs switching Conservatism of delay propagation HW #8: Suppose you change the size of one (combinational) gate in your design, thus invalidating the previous timing analysis. How much work must be done to regain a correct timing analysis? Courtesy K. Keutzer et al. UCB
CSE241 L4 System.35Kahng & Cichy, UCSD ©2003 Timing Correction Driven by STA l “Incremental performance analysis backplane” Fix electrical violations l Resize cells l Buffer nets l Copy (clone) cells Fix timing problems l Local transforms (bag of tricks) l Path-based transforms DAC-2002, Physical Chip Implementation
CSE241 L4 System.36Kahng & Cichy, UCSD ©2003 Local Synthesis Transforms Resize cells Buffer or clone to reduce load on critical nets Decompose large cells Swap connections on commutative pins or among equivalent nets Move critical signals forward Pad early paths Area recovery DAC-2002, Physical Chip Implementation
CSE241 L4 System.37Kahng & Cichy, UCSD ©2003 Transform Example Delay = 4 ….. Double Inverter Removal ….. Delay = 2 DAC-2002, Physical Chip Implementation
CSE241 L4 System.38Kahng & Cichy, UCSD ©2003 Resizing b a d e f ? b a A b a C DAC-2002, Physical Chip Implementation
CSE241 L4 System.39Kahng & Cichy, UCSD ©2003 Cloning b a d e f g h 0.2 ? b a d e f g h A B DAC-2002, Physical Chip Implementation
CSE241 L4 System.40Kahng & Cichy, UCSD ©2003 Buffering b a d e f g h 0.2 ? b a d e f g h B B DAC-2002, Physical Chip Implementation
CSE241 L4 System.41Kahng & Cichy, UCSD ©2003 Redesign Fan-in Tree a c d b e Arr(b)=3 Arr(c)=1 Arr(d)=0 Arr(a)=4 Arr(e)= c d e Arr(e)=5 1 1 b 1 a DAC-2002, Physical Chip Implementation
CSE241 L4 System.42Kahng & Cichy, UCSD ©2003 Redesign Fan-out Tree Longest Path = Longest Path = 4 Slowdown of buffer due to load DAC-2002, Physical Chip Implementation
CSE241 L4 System.43Kahng & Cichy, UCSD ©2003 Decomposition DAC-2002, Physical Chip Implementation
CSE241 L4 System.44Kahng & Cichy, UCSD ©2003 Swap Commutative Pins 2 c a b a c b Simple sorting on arrival times and delay works DAC-2002, Physical Chip Implementation