Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of.

Slides:



Advertisements
Similar presentations
April 2004NUCAD Northwestern University1 Minimal Period Retiming Under Process Variations Jia Wang and Hai Zhou Electrical & Computer Engineering Northwestern.
Advertisements

Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.
Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
Yi-Lin Chuang1, Sangmin Kim2, Youngsoo Shin2, and Yao-Wen Chang National Taiwan University, Taiwan KAIST, Korea 2010 DAC.
On the Need for Statistical Timing Analysis Farid N. Najm University of Toronto
3D-STAF: Scalable Temperature and Leakage Aware Floorplanning for Three-Dimensional Integrated Circuits Pingqiang Zhou, Yuchun Ma, Zhouyuan Li, Robert.
Leakage and Dynamic Glitch Power Minimization Using MIP for V th Assignment and Path Balancing Yuanlin Lu and Vishwani D. Agrawal Auburn University ECE.
Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of.
FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model FastPlace: Efficient Analytical Placement.
0 1 Width-dependent Statistical Leakage Modeling for Random Dopant Induced Threshold Voltage Shift Jie Gu, Sachin Sapatnekar, Chris Kim Department of Electrical.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Parameterized Timing Analysis with General Delay Models and Arbitrary Variation Sources Khaled R. Heloue and Farid N. Najm University of Toronto {khaled,
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
A Useful Skew Tree Framework for Inserting Large Safety Margins Rickard Ewetz and Cheng-Kok Koh School of Electrical and Computer Engineering, Purdue University.
An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim, Deokjin Joo, Taehan Kim DAC’13.
Statistical Full-Chip Leakage Analysis Considering Junction Tunneling Leakage Tao Li Zhiping Yu Institute of Microelectronics Tsinghua University.
Yuanlin Lu Intel Corporation, Folsom, CA Vishwani D. Agrawal
CMOS Circuit Design for Minimum Dynamic Power and Highest Speed Tezaswi Raja, Dept. of ECE, Rutgers University Vishwani D. Agrawal, Dept. of ECE, Auburn.
Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov Dept. of EECS, University of Michigan 1 ICCAD 2010, Dong-Jin Lee, University.
Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources Lerong Cheng 1, Jinjun Xiong 2, and Prof. Lei He 1 1 EE Department, UCLA.
38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.
1 Generalized Buffering of PTL Logic Stages using Boolean Division and Don’t Cares Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering,
Input-Specific Dynamic Power Optimization for VLSI Circuits Fei Hu Intel Corp. Folsom, CA 95630, USA Vishwani D. Agrawal Department of ECE Auburn University,
TH EDA NTHU-CS VLSI/CAD LAB 1 Re-synthesis for Reliability Design Shih-Chieh Chang Department of Computer Science National Tsing Hua University.
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
© 2005 Altera Corporation © 2006 Altera Corporation Placement and Timing for FPGAs Considering Variations Yan Lin 1, Mike Hutton 2 and Lei He 1 1 EE Department,
Jan. 2007VLSI Design '071 Statistical Leakage and Timing Optimization for Submicron Process Variation Yuanlin Lu and Vishwani D. Agrawal ECE Dept. Auburn.
Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel,
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
Circuit Performance Variability Decomposition Michael Orshansky, Costas Spanos, and Chenming Hu Department of Electrical Engineering and Computer Sciences,
High-Speed Circuit-Tuning Techniques Based on Lagrangian Relaxation Charlie Chung-Ping Chen (608)
1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.
Gate Sizing by Mathematical Programming Prof. Shiyan Hu
Simultaneous Rate and Power Control in Multirate Multimedia CDMA Systems By: Sunil Kandukuri and Stephen Boyd.
DDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators Tuck-Boon Chan †, Puneet Gupta §, Andrew B. Kahng †‡ and Liangzhen.
A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.
DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING BARIS TASKIN and IVAN S. KOURTEV ISPD 2005 High Performance Integrated Circuit Design Lab. Department of.
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
An Efficient Algorithm for Dual-Voltage Design Without Need for Level-Conversion SSST 2012 Mridula Allani Intel Corporation, Austin, TX (Formerly.
A Sensor-Assisted Self-Authentication for Hardware Trojan Detection Min Li*, Azadeh Davoodi*, Mohammad Tehranipoor** * University of Wisconsin-Madison.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
Implementation of Finite Field Inversion
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
26 th International Conference on VLSI January 2013 Pune,India Optimum Test Schedule for SoC with Specified Clock Frequencies and Supply Voltages Vijay.
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.
Statistical Transistor-Level Methodology for CMOS Circuit Analysis and Optimization Zuying Luo and Farid N. Najm.
STA with Variation 1. 2 Corner Analysis PRCA (Process Corner Analysis):  Takes 1.nominal values of process parameters 2.and a delta for each parameter.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Intelligent controller design based on gain and phase margin specifications Daniel Czarkowski  and Tom O’Mahony* Advanced Control Group, Department of.
Xuanxing Xiong and Jia Wang Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois, United States November, 2011 Vectorless.
QuickYield: An Efficient Global-Search Based Parametric Yield Estimation with Performance Constraints Fang Gong 1, Hao Yu 2, Yiyu Shi 1, Daesoo Kim 1,
Stochastic Optimization
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
1ISPD'03 Process Variation Aware Clock Tree Routing Bing Lu Cadence Jiang Hu Texas A&M Univ Gary Ellis IBM Corp Haihua Su IBM Corp.
High-Speed Circuit-Tuning Techniques Based on Lagrangian Relaxation Charlie Chung-Ping Chen ICCAD 99’ Embedded Tutorial Session 12A
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong , Computer Science Department , UCLA Presented.
1 Timing Closure and the constant delay paradigm Problem: (timing closure problem) It has been difficult to get a circuit that meets delay requirements.
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
University of Michigan Advanced Computer Architecture Lab. 2 CAD Tools for Variation Tolerance David Blaauw and Kaviraj Chopra University of Michigan.
Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.
Post-Silicon Tuning for Optimized Circuits
Post-Silicon Calibration for Large-Volume Products
Presentation transcript:

Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of Electrical and Computer Engineering University of Maryland College Park Vishal Khandelwal and Ankur Srivastava Department of Electrical and Computer Engineering University of Maryland College Park

2 Introduction Process variations cause significant spread in design performance in sub 90nm technologies Impact yield and reliability It is necessary to explicitly consider the impact of process variations on design parameters Several statistical analysis and optimization techniques have been proposed to improve timing/power yields

3 Handling Process Variations Statistical Gate Sizing Statistical Buffer Insertion Process Variations Design-Time Optimization Post-Fabrication Tunability Post-Silicon Tunable Clock-Tree Buffers Adaptive Body-Biasing [Davoodi, DAC’06] [Sapatnekar, DAC’05] [Zhou, ICCAD’05] [He, ISPD’06] [Davoodi, ICCD’05] [Wong, ICCAD’05] [Khandelwal, ICCAD’03] [Chen, ICCAD’05] [Mahoney, ISSC’05] [Takahashi, 2003] [Tam, JSSC’00] [Kim, ISLPED’03] [Orshansky, ICCAD’06]

4 Traditional Gate Sizing Minimize Area, Power, … Gate size: s i Minimize area, or power  Subject to: meeting a delay constraint at the output size constraints [Fishburn, Dunlop 1985] [Sapatnekar,1993] titi i tjtj didi n 0 T cons

5 Traditional Gate Sizing i j Posynomial Gate Delay Expression [Fishburn, Dunlop 1985] [Sapatnekar,1993] Minimize Area, Power, … Convex Formulation

6 Effects of Process Variations Delay of each gate becomes a random variable Statistical Gate Sizing T ox n + n + L eff Set of random variables with arbitrary distributions [Davoodi, DAC’06] [Sapatnekar, DAC’05] [Zhou, ICCAD’05]

7 Post-Silicon Tunable (PST) Clock Tree Buffers FF 1 FF 2 FF 3 FF 4 FF 5 FF 6 FF 7 FF 8 B1 B2 B4 B3 B5 B6B7 Tunable clock buffers can introduce extra slack into critical paths after fabrication Design Overhead  Area, Clock-Tree Power [Chen, ICCAD’05] [Mahoney, ISSC’05] [Takahashi, 2003] [Tam, JSSC’00]

8 Post-Silicon Tunable Clock Tree Buffers Let D ij be the delay of the longest path between flip-flops i and j Consider Flip-Flops 2 and 7: Tune buffers to change clock-skew FF 1 FF 2 FF 3 FF 4 FF 5 FF 6 FF 7 FF 8 B1 B2 B4 B3 B5 B6B7

9 Optimization Objective: Tunability Cost Metric to capture the overhead due to PST buffers in the design  Silicon Area  Clock-Tree Power

10 Optimization Objective: Binning Yield Loss [V. Zolotov, DAC’04] Convex loss function Q(.) Loss T cons Delay (t) (BYL) [D. Blaauw, GLSVLSI’05]

11 Problem Statement Given a sequential design with a synthesized PST clock- tree (known buffer locations), perform simultaneous  Statistical gate sizing  PST buffer tuning range determination Such that Binning Yield Loss and Tunability Cost is minimized F FF1FF1F FF2FF2F FF3FF3F FF4FF4F FF5FF5F FF6FF6F FF7FF7F FF8FF8 B1 B2 B4 B3 B5 B6B7 i di n 0 T cons

12 Two-Stage Formulation Gate Size:, Tuning Buffer Range: 1.Deterministic constraints: meeting timing requirement assuming no variations 2.Capturing variability in objective First Stage

13 Second Stage Formulation T cons Loss Q Second Stage Given a solution to the first stage problem and a variability sample: No Statistical Timing Analysis scheme exists to estimate the timing distribution of a circuit given gate sizes and tuning buffer ranges  Each sample of variability requires different amount of tuning for maximum timing yield

14 THEOREM:The proposed two-stage stochastic programming formulation is convex PROOF:Detailed proof omitted for brevity Convex Problem First stage constraints are convex First stage objective is convex if BYL(x,r) is convex From second stage formulation one can show that is convex Need to show each sample is convex

15 Kelley’s Cutting Plane Algorithm Iteratively solve first and second stage formulation Given a solution to the first stage formulation, we use method of finite differences to generate a lower bound to BYL from the second stage formulation Add this constraint to the first stage formulation at each iteration

16 Shortest-Path Constraints Inherently non-convex in nature Approximate gate delay using a linear approximation (lower bound) The two-stage stochastic programming formulation can be modified to consider shortest path constraints

17 Experimental Results Implemented the framework in SIS using MOSEK to solve the convex formulation Used CAPO to place netlist to get spatially correlated gate delays Assumed 15% V th variation in 90nm technology node [Predictive Technology Model] Synthesized the PST clock-tree using the technique proposed in [Chen et. al, ICCAD’05] xixi xixi yiyi yiyi i i xjxj xjxj yjyj yjyj j j

18 Experimental Results Experimental Comparison – ISCAS benchmarks  [Chen]: Nominal gate sizing PST clock-tree generation using [Chen et. al, ICCAD’05]  Sensitivity: Retain PST clock-tree location and range Sensitivity-driven statistical gate sizing algorithm –Size the gate with maximum yield gain greedily (iterative) –Similar in spirit to [Zhou ICCAD’05, Zolotov DAC’05]  Stochastic: Retain PST clock-tree buffer locations Proposed simultaneous gate sizing and post-silicon tunability allocation algorithm

19 BYL, Area and Tuning Range Comparison

20 Timing Yield Loss Comparison [Chen]SensitivityStochastic Average Timing Yield Loss

21 Runtime Comparison Techniques344s382s400s526s635 Sensitivity Stochastic Number of Iterations

22 Summary and Future Work Variability-driven framework for simultaneous gate sizing and post-silicon tunability allocation to minimize binning- yield loss and tunability cost Efficient stochastic programming based scheme to solve the formulation No assumptions about parameter distribution or their correlations Need to develop a statistical timing analysis scheme that can consider the effect of post-silicon tunability

23 Thank You!