Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis
Introduction Design optimization from System level to layout – far too complex to approach in one big step – divide and conquer approach with fine tuned balance between capability to apply clean mathematical modeling and abstraction algorithmic complexity to compute solutions loss of optimality based on hard partitioning design and verification methodology that requires user guidance – sweet spots change over time due to: semi-conductor technology improvements changes of design architectures/requirements new algorithmic solutions, etc.
Introduction Example: traditional ASIC methodology: –RTL verification based on simulation –logic synthesis from RTL to gate level using combinational paradigm –static timing analysis –formal equivalence checking based on combinational paradigm –ATPG and scan-based testing based on combinational paradigm –standard cell place & route methodology with zero clock-skew distribution
4 Introduction However: –clean boundaries between modeling levels get blurred larger chips and shrinking device sizes require more detailed modeling aggressive performance and power requirements new modeling and algorithmic approaches –Example: RTL sign-off methodology combined approach to logic synthesis and physical design
5 Overview of Circuit Optimizations Combinational Optimization Clock Skew Scheduling Retiming Architectural Restructuring System-Level Optimization Optimization Space Distance from Physical Implementation Verification Challenge Necessity of Integrated Solution
6 Sequential Optimization Techniques State assignment –Lots of theory, practical only for small FSMs, that too targeting 2-level control logic Sequential don’t cares –Compute unreachable states, use them as external don’t cares for the next-state logic State minimization –Easy for completely specified FSMs (n ¢ log n algorithm) –Incompletely specified FSMs Retiming –balancing of path delays by moving registers within circuit topology –interleaving with combinational optimization techniques
7 Integration in Design Flow Optimization Space –significant more optimization freedom for improving performance, power, and area Distance from Physical Implementation –difficult to accurately model impact on final implementation –difficult to mathematically characterize optimization space Verification Challenge –departure from combinational comparison model would break formal equivalence checking –different simulation behavior causes acceptance problems
8 Retiming r1r1 r2r2 r3r3 r4r4 D max =6 D max =8 D min =3 D min =2 D max =0 D min =0 Skew =0 T cycle =8 r’ 1 r4r r4r4 Skew = -1 T cycle =7 ( )
9 Retiming Only setup time constraint (0 clock skew) Simple integration with other logical (e.g. combinational) or physical optimizations Easy combination with clock skew scheduling to obtain global optimum Changes combinational model of design –severe impact on verification methodology Inaccurate delay model if applied globally Computation of equivalent reset state required + -
10 Retiming - Architectural Restructuring r2r2 r2r2 r3r3 r4r4 r1r1... { r2r2 r3r3 r4r4... { 10 { r’ 1 r’ 4
11 Retiming - Architectural Restructuring Smooth extension of regular retiming Potential to alleviate global performance bottlenecks by adding sequential redundancy and pipelining Significant change of design structure –substantial impact on verification methodology Flexible architectural restructuring changes I/O behavior –existing RTL specification methods not always applicable + -
12 Example Design example: I/O flip-flops timing edges Target cycle time (norm): 1.5 Worst slack: (5%) Distribution: 20%30 edges 40%63 edges 60%130 edges 80%249 edges 100%425 edges
13 Verification Timing verification unchanged Sequential optimizations change the next-state and output functions –traditional combinational equivalence checking not applicable –simulation runs not recognizable by designer - acceptance problems Generic solution: –preserve retime function (mapping function) from synthesis for: reducing sequential EC problem back to combinational case –no false positives possible!!!! modifying simulation model to reproduce original simulation output
14 Optimizing Circuits by Retiming Netlist of gates and registers: Various Goals: –Reduce clock cycle time –Reduce area Reduce number of latches Inputs Outputs
15 Retiming Problem –Pure combinational optimization can be suboptimal since relations across register boundaries are disregarded Solutions –Retiming: Move register(s) so that clock cycle decreases, or number of registers decreases and input-output behavior is preserved –RnR: Combine retiming with combinational optimization techniques Move latches out of the way temporarily optimize larger blocks of combinational
16 Circuit Representation [Leiserson, Rose and Saxe (1983)] Circuit represented as retiming graph G(V,E,d,w) –V set of gates –E set of connections –d(v) = delay of gate/vertex v, (d(v) 0) –w(e) = number of registers on edge e, (w(e) 0)
17 Circuit Representation Example: Correlator (from Leiserson and Saxe) (simplified) Circuit (x, y) = 1 if x=y 0 otherwise Operation delay Every cycle in Graph has at least one register i.e. no combinational loops Retiming Graph (Directed) 7 a b + Host
18 Preliminaries For a path p : Clock cycle For correlator c = 13 Path with w(p)=
19 Movement of registers from input to output of a gate or vice versa Does not affect gate functionality's Mathematical formulation: –r: V Z, an integer vertex labeling –w r (e) = w(e) + r(v) - r(u) for edge e = (u,v) Basic Operation Retime by 1 Retime by -1
20 Thus in the example, r(u) = -1, r(v) = -1 results in For a path p: s t, w r (p) = w(p) + r(t) - r(s) Retiming: –r: V Z, an integer vertex labeling –w r (e) =w(e) + r(v) - r(u) for edge e= (u,v) –A retiming r is legal if w r (e) 0, e E Basic Operation vu v u
21 Retiming for Minimum Clock Cycle Problem Statement: (minimum cycle time) Given G (V, E, d, w), find a legal retiming r so that is minimized Retiming: 2 important matrices Register weight matrix Delay matrix
22 Retiming for minimum clock cycle W V0 V1 V2 V3 V0 V1 V2 V c p, if d(p) then w(p) 1 D V0 V1 V2 V3 V0 V1 V2 V V2 V1 v W = register path weight matrix (minimum # latches on all paths between u and v) D = path delay matrix (maximum delay on all paths between u and v)
23 Conditions for Retiming Assume that we are asked to check if a retiming exists for a clock cycle Legal retiming: w r (e) 0 for all e. Hence w r (e) = w(e) = r(v) - r(u) 0 or r (u) - r (v) w (e) For all paths p: u v such that d(p) , we require w r (p) 1 –Thus Take the least w(p) (tightest constraint) r(u)-r(v) W(u,v)-1 Note: this is independent of the path from u to v, so we just need to apply it to u, v such that D(u,v)
24 All constraints in difference-of-2-variable form Related to shortest path problem Solving the constraints Correlator: = 7 Legal: r(u)-r(v) w(e) D>7: r(u)-r(v) W(u,v)-1 V2 v1 v W V0 V1 V2 V3 V0 V1 V2 V D V0 V1 V2 V3 V0 V1 V2 V
25 Do shortest path on constraint graph: (O(|V||E| )) (Bellman Ford Algorithm) A solution exists if and only if there exists no negative weighted cycle. Solving the constraints Legal: r(u)-r(v) w(e) D>7: r(u)-r(v) W(u,v)-1 A solution is r(v 0 ) = r(v 3 ) = 0, r(v 1 ) = r(v 2 ) = -1 r(1) r(0) r(3)r(2) , Constraint graph
26 Retiming To find the minimum cycle time, do a binary search among the entries of the D matrix (0( V E log V )) Retime Retimed correlator: Clock cycle = 3+3+7=13 Clock cycle = 7 V2 v1 v a b + Host a b + W V0 V1 V2 V3 V0 V1 V2 V D V0 V1 V2 V3 V0V1V2V
27 1. Relaxation based: –Repeatedly find critical path; –retime vertex at end of path by +1 (O( V E log V )) 2. Also, Mixed Integer Linear Program formulation Retiming: 2 more algorithms +1 u Critical path v
28 Retiming for Minimum Area Goal: minimize number of registers used where a v is a constant.
29 Minimize: Minimum Registers - Formulation Subject to: w r (e) =w(e) + r(v) - r(u) 0 Reducible to a flow problem
30 Problems with Retiming Computation of equivalent initial states –do not exist necessarily –General solution requires replication of logic for initialization Timing models –too far away from actual implementation 1 0 ? ?