Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.

Slides:



Advertisements
Similar presentations
Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.
Advertisements

Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Buffer and FF Insertion Slides from Charles J. Alpert IBM Corp.
Leakage and Dynamic Glitch Power Minimization Using MIP for V th Assignment and Path Balancing Yuanlin Lu and Vishwani D. Agrawal Auburn University ECE.
Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of.
Training Manual Aug Probabilistic Design: Bringing FEA closer to REALITY! 2.5 Probabilistic Design Exploring randomness and scatter.
Robust Allocation of a Defensive Budget Considering an Attacker’s Private Information Mohammad E. Nikoofal and Jun Zhuang Presenter: Yi-Cin Lin Advisor:
0 1 Width-dependent Statistical Leakage Modeling for Random Dopant Induced Threshold Voltage Shift Jie Gu, Sachin Sapatnekar, Chris Kim Department of Electrical.
Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim, Deokjin Joo, Taehan Kim DAC’13.
Yuanlin Lu Intel Corporation, Folsom, CA Vishwani D. Agrawal
CMOS Circuit Design for Minimum Dynamic Power and Highest Speed Tezaswi Raja, Dept. of ECE, Rutgers University Vishwani D. Agrawal, Dept. of ECE, Auburn.
Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.
May 14, ISVLSI 09 Algorithms for Estimating Number of Glitches and Dynamic Power in CMOS Circuits with Delay Variations Jins Davis Alexander Vishwani.
Input-Specific Dynamic Power Optimization for VLSI Circuits Fei Hu Intel Corp. Folsom, CA 95630, USA Vishwani D. Agrawal Department of ECE Auburn University,
TH EDA NTHU-CS VLSI/CAD LAB 1 Re-synthesis for Reliability Design Shih-Chieh Chang Department of Computer Science National Tsing Hua University.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 10 th Edition.
Jan 6-10th, 2007VLSI Design A Reduced Complexity Algorithm for Minimizing N-Detect Tests Kalyana R. Kantipudi Vishwani D. Agrawal Department of Electrical.
Jan. 2007VLSI Design '071 Statistical Leakage and Timing Optimization for Submicron Process Variation Yuanlin Lu and Vishwani D. Agrawal ECE Dept. Auburn.
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
1 Jieyi Long, Ja Chun Ku, Seda Ogrenci Memik, Yehea Ismail Dept. of EECS, Northwestern Univ. SACTA: A Self-Adjusting Clock Tree Architecture to Cope with.
1 A Method for Fast Delay/Area Estimation EE219b Semester Project Mike Sheets May 16, 2000.
1 Assessment of Imprecise Reliability Using Efficient Probabilistic Reanalysis Farizal Efstratios Nikolaidis SAE 2007 World Congress.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Confidence Interval Estimation Statistics for Managers.
USING SAT-BASED CRAIG INTERPOLATION TO ENLARGE CLOCK GATING FUNCTIONS Ting-Hao Lin, Chung-Yang (Ric) Huang Graduate Institute of Electrical Engineering,
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Accelerating Statistical Static Timing Analysis Using Graphics Processing Units Kanupriya Gulati and Sunil P. Khatri Department of ECE, Texas A&M University,
Chapter 4 Stochastic Modeling Prof. Lei He Electrical Engineering Department University of California, Los Angeles URL: eda.ee.ucla.edu
A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
Pattern Sensitive Placement For Manufacturability Shiyan Hu, Jiang Hu Department of Electrical and Computer Engineering Texas A&M University College Station,
Pattern Sensitive Placement For Manufacturability Shiyan Hu, Jiang Hu Department of Electrical and Computer Engineering Texas A&M University College Station,
Chap 7-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 7 Estimating Population Values.
Skewed Flip-Flop Transformation for Minimizing Leakage in Sequential Circuits Jun Seomun, Jaehyun Kim, Youngsoo Shin Dept. of Electrical Engineering, KAIST,
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
Behnam Ghavami and Hossein Pedram Presented by Wei-Lun Hung A CAD Framework for Leakage Power Aware Synthesis of Asynchronous Circuits.
Probabilistic Design Systems (PDS) Chapter Seven.
Monte-Carlo based Expertise A powerful Tool for System Evaluation & Optimization  Introduction  Features  System Performance.
Can small quantum systems learn? NATHAN WIEBE & CHRISTOPHER GRANADE, DEC
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
Structural & Multidisciplinary Optimization Group Deciding How Conservative A Designer Should Be: Simulating Future Tests and Redesign Nathaniel Price.
Chapter 8 Confidence Interval Estimation Statistics For Managers 5 th Edition.
Yanqing Zhang University of Virginia On Clock Network Design for Sub- threshold Circuitry 1.
Chapter 7 Confidence Interval Estimation
Heuristic Optimization Methods
Alireza Shafaei, Shuang Chen, Yanzhi Wang, and Massoud Pedram
Defining Statistical Sensitivity for Timing Optimization of Logic Circuits with Large-Scale Process and Environmental Variations Xin Li, Jiayong Le, Mustafa.
Buffer Insertion with Adaptive Blockage Avoidance
Chapter 4a Stochastic Modeling
Pattern Compression for Multiple Fault Models
EE201C Modeling of VLSI Circuits and Systems Final Project
Topics Performance analysis..
Chapter 4a Stochastic Modeling
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.
Confidence Interval Estimation
Circuit Design Techniques for Low Power DSPs
Post-Silicon Tuning for Optimized Circuits
Post-Silicon Calibration for Large-Volume Products
On the Improvement of Statistical Timing Analysis
ECE 352 Digital System Fundamentals
Timing Analysis and Optimization of Sequential Circuits
A Random Access Scan Architecture to Reduce Hardware Overhead
Optimization under Uncertainty
Presentation transcript:

Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University

2 Outline of Post-Silicon Tuning Introduction and Motivation Introduction and Motivation Problem Formulation Problem Formulation Algorithms Algorithms Experimental Results Experimental Results Conclusion Conclusion

3 Pre-Silicon Optimization Pre-silicon (i.e., design-time) statistical optimization Pre-silicon (i.e., design-time) statistical optimization –Determine the circuit parameters in design time –Apply the resulting design to all dies –Problems Hard to get accurate statistical variation model Hard to get accurate statistical variation model Each die has own specific parameter deviations, so the solution is not necessarily ideal for each die Each die has own specific parameter deviations, so the solution is not necessarily ideal for each die Large computation overhead Large computation overhead 50ps Deterministic DesignStatistical Design

4 Post-Silicon Tuning After fabrication, tune e.g., V dd, body voltage of gates. After fabrication, tune e.g., V dd, body voltage of gates. –Post-silicon tuning handles each die separately, compensate specific parameter deviations for each die. In design time In design time –What are the tuning ranges of gates? Tunability/overhead tradeoff Tunability/overhead tradeoff

5 Previous Works Logic signal tuning: body voltage tuning Logic signal tuning: body voltage tuning –Good tunability –Large overhead: DA converter and many control signals, applied to a circuit block Clock signal tuning: tunable clock buffer Clock signal tuning: tunable clock buffer –Small tunability –Small overhead: padding different loads to buffers

6 Logic Tuning and Clock Tuning FF Tune the body voltage Padding different load to clock buffers Clock

7 Example For Unified Adaptivity Optimization Target clock period = 10, yield target: 99% Target clock period = 10, yield target: 99% It is a zero-skew design with nominal delay shown It is a zero-skew design with nominal delay shown Each combinational path has 10% variation Each combinational path has 10% variation 10 FF 99 10

8 Worst Delay Due To Variations 11 FF % variations on each combinational logic Target clock period: 10

9 Logic Tuning 11 FF % variations on each combinational logic Target clock period: 10 Tuning body voltage of combinational logic blocks

10 Clock Tuning 11 FF % variations on each combinational logic Target clock period: 10 Skewing cannot make them simultaneously satisfy timing constraint 11 - skew at right buffer + skew at left buffer 11 + skew at right buffer - skew at left buffer

11 Unified Optimization - tuning logic and clock signal simultaneously FF Skew = 1 10% variations on each combinational logic Target clock period: 10 Logic tuning

12 Observation Logic tuning only Logic tuning only –waste area Clock tuning only Clock tuning only –may not satisfy the yield target A unified approach can satisfy yield target with small overhead A unified approach can satisfy yield target with small overhead

13 Limitations of Previous Work Mostly restricted to continuous adaptivity optimization even when they only perform logic or clock signal tuning Mostly restricted to continuous adaptivity optimization even when they only perform logic or clock signal tuning –In practice, options are often discrete Assumption on variation distribution Assumption on variation distribution –Limited to Gaussian distribution, not always true in reality –If no such assumption, then depends on computationally expensive Monte Carlo simulation We seek to overcome the above limitations We seek to overcome the above limitations

14 Problem Given a sequential circuit, perform optimizations Given a sequential circuit, perform optimizations –the yield target can be achieved by post-silicon tuning on logic and clock signals –the overhead is minimized

15 Continuous Problem FF Continuous body voltage Continuous loads

16 Continuous Problem Formulation Minimize Overhead Minimize Overhead Subject to: Subject to: Long path constraint Long path constraint Short path constraint Short path constraint Tuning bound at each tunable element Tuning bound at each tunable element FF T 12 S1S1 S2S

17 Robust Linear Programming Linear programming with random variables Linear programming with random variables Worst-case solution Worst-case solution –All S and T can simultaneously be the worst-case values. Robust solution Robust solution –Specify p ≤ total number of random variables –In the solution, at most random variables can be simultaneously the worst-case –Variations of the other random variables rely on p. –Degree of conservatism is controlled by a single parameter. Constraint violation probability (related to yield) is exponentially decreased with increase of p. Constraint violation probability (related to yield) is exponentially decreased with increase of p.

18 Linear Programming With Uncertainty Some coefficients are random variables Assume that we have j random variables

19 Soyster’s Worst Case Solution (I) a 11 is a random variable Deterministic constraint Guarantees the worst- case values

20 Soyster’s Worst Case Solution

21 Robust Solution (I)

22 Robust Solution (II) Additional variables.

23 Nominal-Case Design (P=0) q ij =0 Free to set Z i

24 Worst-Case Design (P=j)

25 Worst-Case Design (P=j)

26 Worst-Case Design (P=j)

27 Discretization In reality, tuning is allowed for some steps. In reality, tuning is allowed for some steps. Rounding from continuous solution Rounding from continuous solution –Rounding up continuous solution Increase tuning range more overhead Increase tuning range more overhead –Rounding down continuous solution Reduce tuning range not satisfying yield target Reduce tuning range not satisfying yield target –Nearest rounding not satisfy yield target not satisfy yield target waste area waste area

28 Our Approach Continuous solution Clock rounding Logic rounding Rounding by dynamic programming w/ fast pruning A set of solutions w/ discrete clock buffers For each solution, discretize body voltage for logic gates

29 Clock Rounding Larger tuning range Smaller tuning range

30 Solution Characterization and Solution Update Each candidate solution is associated with Each candidate solution is associated with –C: cumulative area overhead –Y: yield estimation Tunable clock buffer b is being processed, Tunable clock buffer b is being processed, –C is updated by the overhead of b –Y is computed by fast yield estimation

31 Fast Pruning For rounding up, no need to estimate the yield. For rounding up, no need to estimate the yield. For rounding down, sort solutions by C and perform yield estimation in a binary search fashion. For rounding down, sort solutions by C and perform yield estimation in a binary search fashion. When the solution set size reaches a threshold, pick top few solutions with smallest C. When the solution set size reaches a threshold, pick top few solutions with smallest C.

32 Logic Rounding Reducibility based discretization Reducibility based discretization –Body voltage tuning range of a block is rounded up Timing critical Timing critical Few gates are tunable Few gates are tunable Reducibility cost: total slack x number of gates Reducibility cost: total slack x number of gates

33 Batch Optimization Round up blocks with reducibiity cost < threshold and round down others If yield not satisfied, increase the threshold Start from small reducibility threshold Yield estimation is expensive

34 Monte Carlo Simulation (Yield Estimation)

35 Latin Hypercube Sampling Based Monte Carlo Simulation

36 Experimental Setup ISCAS’89 benchmark circuits ISCAS’89 benchmark circuits Pentium IV machine with 3.0G CPU and 2G memory Pentium IV machine with 3.0G CPU and 2G memory 130nm technology 130nm technology Timing yield target 99% Timing yield target 99% For continuous solution, compare to Logic optimization only and Clock optimization only For continuous solution, compare to Logic optimization only and Clock optimization only For discretization, compare to simple batch and nearest rounding approach For discretization, compare to simple batch and nearest rounding approach

37 Continuous Solution (Area) In many cases, optimizing clock signal alone cannot find feasible solutions satisfying yield constraint

38 Continuous Solution (Yield)

39 Continuous Solution (CPU in seconds)

40 Observations in Continuous Solution Unified optimization often saves >20% area over Logic optimization while having larger yield Unified optimization often saves >20% area over Logic optimization while having larger yield Clock optimization only cannot satisfy yield target for many circuits Clock optimization only cannot satisfy yield target for many circuits The algorithms run fast The algorithms run fast

41 Discretization (Area)

42 Discretization (Yield)

43 Discretization (CPU in seconds)

44 Observations in Discrete Solutions Nearest rounding cannot satisfy yield target (could be <90%). Nearest rounding cannot satisfy yield target (could be <90%). Simple batch is slow and solution quality is not good due to not being guided by continuous solution. Simple batch is slow and solution quality is not good due to not being guided by continuous solution. Our algorithm runs faster than Simple batch and saves >30% area. Our algorithm runs faster than Simple batch and saves >30% area.

45 Conclusion Unified adaptivity optimization on logical signal and clock signals shows advantage on cost-effectiveness Unified adaptivity optimization on logical signal and clock signals shows advantage on cost-effectiveness Provide both continuous and discrete solutions Provide both continuous and discrete solutions Use robust linear programming which does not depend on variation distribution Use robust linear programming which does not depend on variation distribution Computation acceleration techniques, e.g., accelerated dynamic programming, batch-based optimization, Latin Hypercube sampling based fast simulation, are used Computation acceleration techniques, e.g., accelerated dynamic programming, batch-based optimization, Latin Hypercube sampling based fast simulation, are used Our algorithm can be used for optimizing logic or clock signal separately while still having the above advantages Our algorithm can be used for optimizing logic or clock signal separately while still having the above advantages