Clock Skewing EECS 290A Sequential Logic Synthesis and Verification
Outline Motivation Graphs Algorithms for the shortest path computation Dijkstra and Bellman-Ford Dijkstra and Bellman-Ford Optimum cycle ratio computation Howard algorithm Howard algorithm ASAP and ALAP skews Clock skew as the shortest path Retiming as discrete clock skewing
Motivation When combinational optimization cannot help, sequential optimization holds some promise Sequential optimization changes one or more of the following the clock cycle (clock skewing) the clock cycle (clock skewing) the number and positions of memory elements (retiming) the number and positions of memory elements (retiming) combinational logic (retiming and resynthesis) combinational logic (retiming and resynthesis) Clock skewing is an “easy” way of reducing the clock period without moving latches Moving latches, if done on a mapped and placed netlist, may destroy placement, etc Moving latches, if done on a mapped and placed netlist, may destroy placement, etc
Directed Graphs Graph is set of vertices and edges G = (V,E) Each edge is directed (has a source and a sink) A path is the sequence of vertices connected by edges A cycle is the circular path Graph is strongly connected if there exist a path from any vertex to any other vertex. For the general formulation of the graph problems, each edge e has distance, d(e), and a latency, t(e) In this lecture Graph is the “latch dependency graph” Graph is the “latch dependency graph” Vertices are latchesVertices are latches Edges are combinational paths between the latchesEdges are combinational paths between the latches Distance of an edge is its combinational delay Distance of an edge is its combinational delay Latency of an edge is 1 Latency of an edge is 1
Graph Problems Optimum cycle ratio Given d(e) and t(e) for each edge e, for each cycle C in G we define a cycle ratio: Given d(e) and t(e) for each edge e, for each cycle C in G we define a cycle ratio: (C) = D(C)/T(C), where D(C) = ei C d(e i ), T(C) = ei C t(e i ) (C) = D(C)/T(C), where D(C) = ei C d(e i ), T(C) = ei C t(e i ) The problem is to determine the min(max) ratio * over all cycles C in G The problem is to determine the min(max) ratio * over all cycles C in G Shortest path Given d(e) for each edge e, and a source vertex s, determine the shortest path from s to any other vertex in G Given d(e) for each edge e, and a source vertex s, determine the shortest path from s to any other vertex in G
Shortest Path: Preliminaries Start-shortest-path (G,s) For each vertex v G For each vertex v G w(v) = w(v) = p(v) = NULLp(v) = NULL w(s) = 0 w(s) = 0 w(v) is the shortest path from vertex s to vertex v w(v) is the shortest path from vertex s to vertex v p(v) is the predecessor function, which gives for each node v, the previous node on the shortest path from s p(v) is the predecessor function, which gives for each node v, the previous node on the shortest path from s Relax/tighten ( u, v, d() ) if ( w(v) > w(u) + d(u,v) ) w(v) = w(u) + d(u,v) p(v) = u u s v w(u)=3 w(v)=6 w(v)=4 w(v) > w(u) + w(u,v) 6 > w(v) = 4
Shortest Path: Dijkstra Algorithm Start-shortest-path(G,s) S= , Q w = V(G) while ( Q w ) U = Extract-Min( Q w ) U = Extract-Min( Q w ) S = S {u} S = S {u} for each vertex v, which is a successor of u for each vertex v, which is a successor of u Relax( u, v, d() )Relax( u, v, d() ) Update ordering in Q wUpdate ordering in Q w Q is a priority queue storing vertices by their distance S is the set of vertices, whose shortest path from s has already been found
Example T. H. Cormen, C. E. Leiserson, R. L. Rivest, Introduction to algorithms, New York: McGraw-Hill, 1990.
Shortest Path: Bellman-Ford The limitation of Dijkstra is that it only works for positive distances w(u,v) Bellman-Ford overcomes this limitation and can detect a negative cycle Start-shortest-path(G,s) for i = 1 to i < |V(G)| for each edge (u,v) E(G) for each edge (u,v) E(G) relax( u, v, d() )relax( u, v, d() ) for each edge (u,v) E(G) if w(v) > w(u) + d(u,v) if w(v) > w(u) + d(u,v) return FALSEreturn FALSE return TRUE
Example
Efficient Implementation of Bellman-Ford If w(u) is not tightened in the current iteration, u cannot affect the distances of its successors in the next iteration Start-shortest-path(G,s) Q = {s} /* Q is a FIFO queue */ while ( Q ) u = Extract from Q u = Extract from Q for each edge (u,v) E(G) for each edge (u,v) E(G) relax( u, v, d() )relax( u, v, d() ) if ( distance of v has changed )if ( distance of v has changed ) Insert v into Q Insert v into Q Check for negative cycle
Optimum Cycle Ratio Determine the min(max) ratio * over all cycles C in G Applications: Problem 1: Find the loop, which has the largest combinational delay per one memory element The circuit cannot be clocked faster than this delay The circuit cannot be clocked faster than this delay Problem 2: Find the loop, which has the smallest combinational delay per one memory element If the circuit is implemented with transparent latches, this delay should satisfy some constraints If the circuit is implemented with transparent latches, this delay should satisfy some constraints
Latch-to-Latch Max Delay Native method: Cut at the latch boundary Cut at the latch boundary For each pair (i, j) of latches For each pair (i, j) of latches Set arrival times of latch i to 0, the rest of latches to - Set arrival times of latch i to 0, the rest of latches to - Perform DFS from latch j to find its combinational delayPerform DFS from latch j to find its combinational delay Better method: Cut at the latch boundary Cut at the latch boundary For each latch i For each latch i Set arrival times of latch i to 0, the rest of latches to - Set arrival times of latch i to 0, the rest of latches to - Move through the TFO cone of latch i in the topological order and propagate the arrival times through the fanoutsMove through the TFO cone of latch i in the topological order and propagate the arrival times through the fanouts Collect the latches j such that their arrival times is more than - Collect the latches j such that their arrival times is more than -
Cycle Ratio Algorithms A. Dasdan, “Experimental analysis of the fastest optimum cycle ratio and mean algorithms”, ACM TODAES, vol. 9(4), pp , 2004
Overview of Howard’s Algorithm This is a Bellman-Ford algorithm with a cycle detection subroutine, which gradually tightens the lower bound on the Max Cycle Ratio (MCR) Exponential in the worst case but efficient in practice Heuristics are used for faster convergence Find a good starting cycle ratio Find a good starting cycle ratio Detect only relevant changes Detect only relevant changes Preprocessing the graph Remove non-cyclic branches Remove non-cyclic branches Decompose into strongly commented components Decompose into strongly commented components
Notation for Howard’s Algorithm u, v are vertices, which represent latches w(u,v) is the distance between u and v, which represents the combinational delay Defined for adjacent vertices only Defined for adjacent vertices only d(u) is the longest distance from u to any vertex v p(u) is the successor function For each node u returns the node v such that the distance between u and v is the longest (equal to d(u)) For each node u returns the node v such that the distance between u and v is the longest (equal to d(u)) r is the current best maximum ratio for any loop Initialized to a longest self-loop and refined to r’ in procedure FindRatio() Initialized to a longest self-loop and refined to r’ in procedure FindRatio()
MCR: Find Ratio Initialization Searching for a new cycle Determining a new ratio Trying to find a longer loop Updating the ratio
Howard’s Algorithm Initialization Trying to find longer loops Heuristic to speed up convergence Constraint propagation
Clock Skew Zero-skew Clock arrives at all latches at the same time Clock arrives at all latches at the same time Non-trivial skew Each latch has a skew (a phase of the clock signal at this latch) Each latch has a skew (a phase of the clock signal at this latch) ASAP (“as soon as possible”) and ALAP (“as late as possible”) skews at a latch define a timing window (sequential slack), which the clock at the latch should satisfy for the design to meet the timing constraints The sequential slacks at different latches are not independent The sequential slacks at different latches are not independent Clock skew optimization is a fundamental problem, tightly related to retiming and other sequential transformations Skewing changes the skews of the latches, retiming moves the latches according to the allowed skews Skewing changes the skews of the latches, retiming moves the latches according to the allowed skews
Example PI PO Clock period = 3 Buffer delay = 1 Initial ALAP ASAP ALAP skew = -1 ASAP skew = -3 PI PO PI PO skew = 0 skew = -1 skew = -3
ASAP and ALAP Skew Computation Given a clock period r, set the weight of an edge (u,v) to be w’(u,v) = w(u,v) - r Connect the latches depending on PIs to the source vertex s Connect the latches, which produce POs to the sink vertex t Run Bellman-Form to find the shortest path from s to u This is the ASAP skew of latch u This is the ASAP skew of latch u Run Bellman-Form to find the shortest reverse path from t to u This is the ALAP skew of latch u This is the ALAP skew of latch u t s u