-1- UC San Diego / VLSI CAD Laboratory High-Dimensional Metamodeling for Prediction of Clock Tree Synthesis Outcomes Andrew B. Kahng, Bill Lin and Siddhartha Nath VLSI CAD LABORATORY, UC San Diego
-2- Outline Challenges Testcase generation Design of experiments New estimation technique Prediction methodologies Validation of our methodologies Conclusions
-3- Challenge: High Dimensionality Why is CTS prediction hard? Why is CTS prediction hard? Testcases Layout contexts Tools & knobs Outcomes? (power, skew, delay, wirelength) CTS instance CTS prediction is difficult due to inherent high dimensionality
-4- Challenge: Sensitivity Delay varies by up to 43% with clock entry point locations Delay varies by up to 43% with clock entry point locations Delay varies by up to 45% with core aspect ratio Delay varies by up to 45% with core aspect ratio BLBLM B RBM R CTS outcomes are sensitive to instance parameters
-5- Challenge: Multicollinearity D = Estimation errors increase at high dimensions
-6- Challenge: Realistic Instances Sinks (x, y) Rectangular core Placement blockage Simple testcases and layout contexts do not reflect real-world CTS instances ISPD 2010 CTS Benchmark 01
-7- Contributions Generate realistic testcases with real-world CTS structures Generate realistic testcases with real-world CTS structures Study and identify appropriate modeling parameters Study and identify appropriate modeling parameters Propose hierarchical hybrid surrogate modeling (HHSM) – a divide and conquer approach to overcome parameter collinearity issues Propose hierarchical hybrid surrogate modeling (HHSM) – a divide and conquer approach to overcome parameter collinearity issues Develop prediction methodologies for practical use models Develop prediction methodologies for practical use models –Which tool should be used? –How should the tool be driven? –How wrong can the model guidance be? Validate methodologies on a new CTS instance Validate methodologies on a new CTS instance
-8- Related Works Testcases Testcases –Tsay90 CTS testcases r1 - r5 with sink (x, y) coordinates CTS testcases r1 - r5 with sink (x, y) coordinates –ISPD 2010 Placement blockage Placement blockage Inverters/buffers in clock hierarchy Inverters/buffers in clock hierarchy Prediction Prediction –Kahng02 CUBIST to estimate clock skew, insertion delay CUBIST to estimate clock skew, insertion delay –Kahng13 MARS, RBF, KG, HSM to estimate several clock metrics MARS, RBF, KG, HSM to estimate several clock metrics Uniform placement of sinks, no combinational logic Uniform placement of sinks, no combinational logic Gaps in testcases and layout contexts
-9- Outline Challenges Testcase generation Design of experiments New estimation technique Prediction methodologies Validation of our methodologies Conclusions
-10- Example of Our CTS Testcase Real-world clock structures Real-world clock structures –Clock-gating cells (CGCs) –Clock dividers –Gitch-free clock MUX Multiple levels in the clock tree hierarchy (K 6 vs. K 2 ) Multiple levels in the clock tree hierarchy (K 6 vs. K 2 ) Generators, runscripts to be published Generators, runscripts to be published CGC K1K1 K2K2 cg_en[0] cg_en[1] Glitch Free MUX DIV-8 DIV-4 CGC DIV- 24 CGC K3K3 K4K4 K5K5 K6K6 cg_en[2] cg_en[3] cg_en[4] cg_en[5] cg_en[6] Clk root pin clk mux_en[0] Sinks
-11- Example of Our CTS Instance Nonuniform sink placement
-12- Outline Challenges Testcase generation Design of experiments New estimation technique Prediction methodologies Validation of our methodologies Conclusions
-13- Modeling Parameters Microarchitectural Microarchitectural –M sinks – # sinks Floorplan context Floorplan context –M core, M AR – core area and aspect ratio –M CEP – clock entry point –M block – placement and routing blockage % of core area Tool constraints Tool constraints –M skew, M delay – max skew and insertion delay –M buftran, M sinktran – max buffer and sink transition time –M FO – max fanout –M bufsize, M wire – max buffer size and wire width Nonuniformity measure Nonuniformity measure –M DCT – nonuniformity in sink placement
-14- Modeling Flow Synthesis (DC) Gate-level netlist Testcase Verilog RTL Generate placed DEF Floorplan parameters CTS tool parameters CTS instance CTS + CT route (ToolA) CTS + CT route (ToolB) Extract CTS metrics µArch parameter Nonuniformity parameter Fitted models for metrics Metamodeling
-15- Metamodeling Techniques Accurate because they derive surrogate models from actual post-CTS data Accurate because they derive surrogate models from actual post-CTS data Our techniques Our techniques –Hybrid Surrogate Modeling (HSM) [Kahng13] –Multivariate Adaptive Regression Splines (MARS) [Friedman91] –Radial Basis Function (RBF) [Buhmann03] –Kriging (KG) [Matheron78]
-16- Outline Challenges Testcase generation Design of experiments New estimation technique Prediction methodologies Validation of our methodologies Conclusions
-17- Multicollinearity If parameters are linear combinations of each other If parameters are linear combinations of each other –Example: M AR, M buftran, M sinktran, M wire –Matrix of parameters is ill-conditioned –Large variance in regression coefficients –Hard to determine relationship between parameters and output –Large errors between actual and predicted outputs as D increases Previous works [Kahng13] report large estimation errors (≥ 30%) as D ≥ 10 Previous works [Kahng13] report large estimation errors (≥ 30%) as D ≥ 10
-18- Our Solution: HHSM Hierarchical Hybrid Surrogate Modeling Hierarchical Hybrid Surrogate Modeling Divide the parameters (D) into two sets Divide the parameters (D) into two sets –One set of k parameters has low collinearity –Other set of D – k parameters may have high collinearity –Derive HSM surrogate models for each set –Combine using weights from least-squares regression where, w 1,2 are weights w 1 : k parameters with low collinearity w 2 : D – k parameters with high collinearity
-19- HHSM Accuracy D = ≤ 2% ≤ 13%
-20- Outline Challenges Testcase generation Design of experiments New estimation technique Prediction methodologies Validation of our methodologies Conclusions
-21- Use Models For Prediction Develop methodologies to answer three questions Develop methodologies to answer three questions –Q1: Which tool should be used? –Q2: How should the tool be driven? –Q3: How wrong can the model guidance be?
-22- Q1: Which Tool Should Be Used? Methodology Methodology –Determine the better tool using models –Compare with actual post-CTS data DSkewPowerDelayWirelength Errors increase 8 ≤ D ≤ 11 Errors increase 8 ≤ D ≤ 11 Errors saturate D ≥ 12 Errors saturate D ≥ 12 Worst-case prediction error = 6.13% Worst-case prediction error = 6.13% Incorrect Tool Prediction %
-23- Q2: How Should The Tool Be Driven? Methodology Methodology –Determine the smallest and largest values of parameters that deliver desired outcome Max Skew (ps)Max Delay (ns)Max Buffer Transition (ps) Skew (ps)ToolAToolBToolAToolBToolAToolB 5NNNNNN X X1.5 - X300 - X X1.5 - X300 - X X1.5 - X300 - X N – infeasible X - unbounded Parameter subspaces for tools
-24- Q3: How Wrong Can The Guidance Be? Methodology Methodology –Compare model and actual outcomes of tools –If model is wrong, Power ToolAToolB DSVMSUBSVMSUB Suboptimality ≤ 10% Suboptimality ≤ 10% Wrong guidance % and suboptimality %
-25- Outline Challenges Testcase generation Design of experiments New estimation technique Prediction methodologies Validation of our methodologies Conclusions
-26- Max Skew (ps) Max Delay (ns) Max Buffer Transition (ps) CTS Tool Post-CTS Skew (ps) Number of CTS runs ToolA ToolB Validation on “New” CTS Instance How well does our prediction methodologies generalize? How well does our prediction methodologies generalize? Goals Goals –Apply methodologies to a new CTS instance –Obtain skew target ≤ 30ps Determine parameter values from subspace results of Q2 Determine parameter values from subspace results of Q2 Generalizes with small overhead Generalizes with small overhead Few CTS runs to deliver the desired outcome Few CTS runs to deliver the desired outcome
-27- Outline Challenges Testcase generation Design of experiments New estimation technique Prediction methodologies Validation of our methodologies Conclusions
-28- Conclusions Study high-D CTS prediction with appropriate modeling parameters Generate testcases with real-world CTS structures Propose HHSM to limit error to ≤ 13% even with multicollinearity Develop methodologies for practical use models Ongoing work – –Learning techniques to cure high-D multicollinearity – –Methodologies to characterize EDA tools – –Apply methodologies to reduce time and cost for IC implementation
-29- Acknowledgments Work supported by NSF, MARCO/DARPA, SRC and Qualcomm Inc.
-30- Thank You!
-31- Backup
-32- Brief Background on Metamodeling General form of estimation General form of estimation where, Predicted response deterministic response Random noise function Regression coefficients
-33- Regression Function: MARS where, I i : # interactions in the i th basis function b ji : ±1 x v : v th parameter t ji : knot location Knot = value of parameter where line segment changes slope
-34- Regression Function: RBF where, a j : coefficients of the kernel function K(.): kernel function µ j : centroid r j : scaling factors
-35- Regression Function: KG
-36- Hybrid Surrogate Modeling (HSM) Variant of Weighted Surrogate Modeling but uses least- squares regression to determine weights Variant of Weighted Surrogate Modeling but uses least- squares regression to determine weights where, w 1,2,3 are weights of predicted response of surrogate model for w 1 : MARS w 2 : RBF w 3 : KG