UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010 Slack Redistribution for Graceful Degradation Under Voltage Overscaling Andrew B.

Slides:



Advertisements
Similar presentations
(1/25) UCSD VLSI CAD Laboratory - ISQED10, March. 23, 2010 Toward Effective Utilization of Timing Exceptions in Design Optimization Kwangok Jeong, Andrew.
Advertisements

Tunable Sensors for Process-Aware Voltage Scaling
Thank you for your introduction.
Ispd-2007 Repeater Insertion for Concurrent Setup and Hold Time Violations with Power-Delay Trade-Off Salim Chowdhury John Lillis Sun Microsystems University.
Mapping for Better Than Worst-Case Delays In LUT-Based FPGA Designs Kirill Minkovich and Jason Cong VLSI CAD Lab Computer Science Department University.
Designing a Processor from the Ground Up to Allow Voltage/Reliability Tradeoffs Andrew Kahng (UCSD) Seokhyeong Kang (UCSD) Rakesh Kumar (Illinois) John.
Timing Margin Recovery With Flexible Flip-Flop Timing Model
Xing Wei, Wai-Chung Tang, Yu-Liang Wu Department of Computer Science and Engineering The Chinese University of HongKong
Minimum Implant Area-Aware Gate Sizing and Placement
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
High-Performance Gate Sizing with a Signoff Timer
Yuanlin Lu Intel Corporation, Folsom, CA Vishwani D. Agrawal
CMOS Circuit Design for Minimum Dynamic Power and Highest Speed Tezaswi Raja, Dept. of ECE, Rutgers University Vishwani D. Agrawal, Dept. of ECE, Auburn.
Timing Analysis Timing Analysis Instructor: Dr. Vishwani D. Agrawal ELEC 7770 Advanced VLSI Design Team Project.
-1- UCSD VLSI CAD Laboratory and UIUC PASSAT Group Recovery-Driven Design: A Power Minimization Methodology for Error-Tolerant Processor Modules Andrew.
Power-Aware Placement
Toward PDN Resource Estimation: A Law of General Power Density Kwangok Jeong and Andrew B. Kahng
Architectural-Level Prediction of Interconnect Wirelength and Fanout Kwangok Jeong, Andrew B. Kahng and Kambiz Samadi UCSD VLSI CAD Laboratory
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Toward Performance-Driven Reduction of the Cost of RET-Based Lithography Control Dennis Sylvester Jie Yang (Univ. of Michigan,
1 Razor: A Low Power Processor Design Presented By: - Murali Dharan.
Jan. 2007VLSI Design '071 Statistical Leakage and Timing Optimization for Submicron Process Variation Yuanlin Lu and Vishwani D. Agrawal ECE Dept. Auburn.
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
Circuit-Level Timing Speculation: The Razor Latch Developed by Trevor Mudge’s group at the University of Michigan, 2003.
University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.
Software-Based Online Detection of Hardware Defects: Mechanisms, Architectural Support, and Evaluation Kypros Constantinides University of Michigan Onur.
Enhanced Metamodeling Techniques for High-Dimensional IC Design Estimation Problems Andrew B. Kahng, Bill Lin and Siddhartha Nath VLSI CAD LABORATORY,
UC San Diego / VLSI CAD Laboratory Reliability-Constrained Die Stacking Order in 3DICs Under Manufacturing Variability Tuck-Boon Chan, Andrew B. Kahng,
Accuracy-Configurable Adder for Approximate Arithmetic Designs
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS I. Bucur, N. Cupcea, C. Stefanescu, A. Surpateanu Computer Science and Engineering Department, University.
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
UC San Diego / VLSI CAD Laboratory Toward Quantifying the IC Design Value of Interconnect Technology Improvement Tuck-Boon Chan, Andrew B. Kahng, Jiajia.
Copyright © 2008 UCI ACES Laboratory Kyoungwoo Lee 1, Aviral Shrivastava 2, Nikil Dutt 1, and Nalini Venkatasubramanian 1.
An Efficient Algorithm for Dual-Voltage Design Without Need for Level-Conversion SSST 2012 Mridula Allani Intel Corporation, Austin, TX (Formerly.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
Jia Yao and Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University Auburn, AL 36830, USA Dual-Threshold Design of Sub-Threshold.
UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.
-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Safe Overclocking Safe Overclocking of Tightly Coupled CGRAs and Processor Arrays using Razor © 2012 Guy Lemieux Alex Brant, Ameer Abdelhadi, Douglas Sim,
EEE2243 Digital System Design Chapter 7: Advanced Design Considerations by Muhazam Mustapha, extracted from Intel Training Slides, April 2012.
Robust Low Power VLSI ECE 7502 S2015 Minimum Supply Voltage and Very- Low-Voltage Testing ECE 7502 Class Discussion Elena Weinberg Thursday, April 16,
1 A Cost-effective Substantial- impact-filter Based Method to Tolerate Voltage Emergencies Songjun Pan 1,2, Yu Hu 1, Xing Hu 1,2, and Xiaowei Li 1 1 Key.
MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Outline Introduction: BTI Aging and AVS Signoff Problem
-1- Statistical Analysis and Modeling for Error Composition in Approximate Computation Circuits Wei-Ting Jonas Chan 1, Andrew B. Kahng 1, Seokhyeong.
UC San Diego / VLSI CAD Laboratory Learning-Based Approximation of Interconnect Delay and Slew Modeling in Signoff Timing Tools Andrew B. Kahng, Seokhyeong.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
Mixed Cell-Height Implementation for Improved Design Quality in Advanced Nodes Sorin Dobre +, Andrew B. Kahng * and Jiajia Li * * UC San Diego VLSI CAD.
Static Timing Analysis
-1- UC San Diego / VLSI CAD Laboratory On Potential Design Impacts of Electromigration Awareness Andrew B. Kahng, Siddhartha Nath and Tajana S. Rosing.
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.
-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha.
Characterizing Processors for Energy and Performance Management Harshit Goyal and Vishwani D. Agrawal Department of Electrical and Computer Engineering,
A Novel, Highly SEU Tolerant Digital Circuit Design Approach By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
Yuxi Liu The Chinese University of Hong Kong Circuit Timing Problem Driven Optimization.
Power-Optimal Pipelining in Deep Submicron Technology
Raghuraman Balasubramanian Karthikeyan Sankaralingam
The University of British Columbia
Circuit Design Techniques for Low Power DSPs
Low Power Digital Design
Measuring the Gap between FPGAs and ASICs
Presentation transcript:

UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010 Slack Redistribution for Graceful Degradation Under Voltage Overscaling Andrew B. Kahng †, Seokhyeong Kang †, Rakesh Kumar ‡ and John Sartori ‡ † VLSI CAD LABORATORY, UCSD ‡ PASSAT GROUP, UIUC

(2/25) Outline Background and motivation Voltage scaling and BTWC designs Limitation of Traditional CAD Flow Power-Aware Slack Redistribution Our design optimization goal Related work: BlueShift Our Heuristic Experimental Framework and Results Design methodology Testbed Results and analysis Conclusions and Ongoing Work

(3/25) Reducing Power with Voltage Scaling Power is a first-order design constraint Voltage scaling can significantly reduce power Voltage scaling may result in timing violations Voltage scaling is limited because of timing errors Voltage Timing errors begin to occur

(4/25) CPU, Heal Thyself... Better-Than-Worst-Case Design Better-Than-worst-case (BTWC) design approach Optimize for normal operating conditions Trade off reliability and power/performance Have error detection/correction mechanism (e.g., Razor*) * Ernst et al. “Razor: A low power pipeline based on circuit-level timing speculation”, Proc. MICRO Traditional IC designBTWC design Does not allow timing errors in STA Error correction architecture allows timing errors Fixed target frequency and operating voltage Overclocking or voltage overscaling BTWC design allows tradeoffs between reliability and power

(5/25) Voltage Scaling with Error Correction Error correction incurs power overhead P E (v) : Error rate at voltage v pwr(v) : Power consumption at v Minimum power at point b Voltage v A B A B Overscaling is possible for Better-Than-Worst-Case designs

(6/25) Limitations of Traditional CAD Flow Conventional designs exhibit critical operating points Many paths have near-critical slack → wall of (critical) slack Scaling beyond COP causes massive errors that cannot be corrected Conventional designs fail critically when voltage is scaled down Error rate should be increased gracefully : gradual slope slack

(7/25) Outline Background and motivation Voltage scaling and BTWC designs Limitation of Traditional CAD Flow Power-Aware Slack Redistribution Our design optimization goal Related work: BlueShift Our Heuristic Experimental Framework and Results Design methodology Testbed Results and analysis Conclusions and Ongoing Work

(8/25) Our Design Optimization Goal Problem: Minimize power for a given error rate Goal: Achieve a ‘gradual slope’ slack distribution Approach: Frequently-exercised paths: upsize cells Rarely-exercised paths: downsize cells We make a gradual slope slack distribution with gradual failure characteristic

(9/25) Related Work: BlueShift BlueShift speed up Paths with the highest frequency of timing errors FBB (forward body-biasing) & Timing override * Grescamp et al. “Blueshift: Designing processors for timing speculation from the ground up”, HPCA 2009 BlueShift* : maximize frequency for a given error rate Compute error rate ER < Target Gate-level simulation YES NO Speed up paths Finish Limitation Repetitive gate level simulation – impractical Design overhead of FBB BlueShift is impractical with modern SOC designs

(10/25) Our Heuristic Set initial voltage Error rate estimation ER < ER target Optimize Paths Voltage scaling YES NO Power Reduction Finish Optimize slack distribution by cell swaps, exploiting switching activity information Iteratively scale target voltage the until error rate exceeds a target, and optimize negative slack paths Our heuristic: Voltage scaling → Optimize paths → Power reduction

(11/25) Heuristic Implementation – Voltage Scaling Set initial voltage Error rate estimation ER < ER target Optimize Paths Voltage scaling YES NO Power Reduction Finish Optimize with fixed target voltageLower voltage incrementally Load a pre-characterized library at each voltage point With iterative voltage scaling, we can find minimum operating voltage

(12/25) Heuristic Implementation – Optimize Paths Main idea: increase slack of frequently-exercised paths in order of increasing switching activity Procedure 1.Pick a critical path p with maximum switching activity 2.Resize cell instance c i in p 3.If slack of path p is not improved, cell change is restored 4.Repeat 2. ~ 3. for all cell instances in path p 5.Repeat 2.~ 4. for all critical paths Set initial voltage Error rate estimation ER < ER target Optimize Paths Voltage scaling YES NO Power Reduction Finish OptimizePaths procedure reduces error rates and enables further voltage scaling

(13/25) Heuristic Implementation – Power Reduction Main idea: Downsize cells on rarely-exercised paths in order of decreasing toggle rate Procedure 1.Pick a cell c with minimum toggle rate 2.Downsize cell c with logically equivalent cell 3.Incremental timing analysis and check error rate 4.If error rate is increased, cell change is restored 5.Repeat 1. ~ 4. Set initial voltage Error rate estimation ER < ER target Optimize Paths Voltage scaling YES NO Power Reduction Finish PowerReduction procedure reduces power without affecting error rate

(14/25) Heuristic Implementation – Error Rate Estimation Set initial voltage Error rate estimation ER < ER target Optimize Paths Voltage scaling YES NO Power Reduction Finish Error rate contribution of one flip-flop Error rate of an entire design α : compensation parameter We estimate error rates without functional simulation Error rate estimation: use toggle rate from SAIF(Switching Activity Interchange Format)

(15/25) Power Reduction Through Slack Redistribution Power Minimum power P min is obtained at minimum operating voltage V min 1.OptimizePaths Minimize error rate Enable to scale voltage further 2.ReducePower Downsize cells Obtain additional power reduction 2 1

(16/25) Outline Background and motivation Voltage scaling and BTWC designs Limitation of Traditional CAD Flow Power-Aware Slack Redistribution Our design optimization goal Related work: BlueShift Our Heuristic Experimental Framework and Results Design methodology Testbed Results and analysis Conclusions and Ongoing Work

(17/25) Benchmark generation Virtutech Simics – Full system simulation and capture test vectors Functional simulation Cadence NC Verilog – Gate level simulation Library characterization Cadence SignalStorm – Synopsys Liberty generation for each voltage Heuristic (Slack Optimization) Implement in C++ and use Tcl socket interface with Synopsys PrimeTime ECO P&R Cadence SOCEncounter – ECO implementation Design Methodology

(18/25) Target design : sub-modules of OpenSPARC T1 Benchmark Ammp, bzip2, equake, sort and twolf Make test vectors with 1 billion cycles for each sub-module Implementation TSMC 65GP technology with standard SP&R flow Testbed

(19/25) Design techniques 1.SP&R with 0.8 GHz (loose constraints) 2.SP&R with 1.2 GHz (tight constraints) 3.Blueshift: timing override 4.Slack Optimizer Experiments compare all design techniques with respect to: 1.Power consumption at each voltage point 2.Actual error rates from gate level simulation 3.Power consumption at each target error rate 4.Estimated processor-wide power consumption List of Experiments

(20/25) Error rate at each operating voltage (test case : lsu_dctl) Power consumption at each operating voltage Error Rate and Power Results

(21/25) Power consumption at each target error rate Slack distribution Comparison of Power and Slack Results

(22/25) Power reduction after optimization 2% error rate) Area overhead of design approaches Max %, Avg. 12.5% power reduction Power Reduction and Area Overhead

(23/25) *Kahng et al. “Designing a Processor From the Ground Up to Allow Voltage/Reliability Tradeoffs”, HPCA * Processor-wide Results Slack optimization extends range of voltage scaling and reduces Razor recovery cost

(24/25) Conclusions and Ongoing Work Showed limitations of a BTWC design Presented design technique – slack redistribution Optimize frequently exercised critical paths De-optimize rarely-exercised paths Demonstrated significant power benefits of gradual slack design Reduced power 33% on maximum, 12.5% on average Ongoing work Reliability-power tradeoffs for embedded memory Applying to heterogeneous multi-core architecture

UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010 THANK YOU

UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010 BACKUP

(27/25) CPU, Heal Thyself * Razor: A low power pipeline based on circuit-level timing speculation. In International Symposium on Micro architecture, December Razor* system Timing errors can be corrected Manage the trade-off between system voltage and error rate New design methodology is needed

(28/25) Razor – How it works Razor Implementation Razor: A low power pipeline based on circuit-level timing speculation. In International Symposium on Microarchitecture, December Main flip-flop latches at T, but Shadow latch latches at T+skew If a timing violation occurs, main flip-flop will latch incorrect value, but shadow latch should latch correct value Comparator signals error and the late arriving value is fed back into the main flip-flop

(29/25) BTWC: Voltage Scaling Error correction needs additional clock cycles and incurs power overhead P E (f) : Error rate at frequency f perf(f) : Performance at f Maximum performance at point c Overclocking case P E (v) : Error rate at voltage v pwr(v) : Power consumption at v Minimum power at point c Voltage scaling case

(30/25) Limitation of Voltage Scaling At some voltage, circuit breaks down Voltage scaling must halt after only 10% scaling.

(31/25) Reason for Steep Error Degradation Critical paths are bunched up in traditional designs.

(32/25) Slack Re-distribution Example Positive SlackNegative Slack Negative SlackPositive Slack Error Rate = 1%Error Rate = 25%

(33/25) Heuristic Implementation – Error Rate Estimation Error rate contribution of one flip-flop Error rate of an entire design Actual vs. estimated error rates (1) (2) α : compensation parameter

(34/25) Gradual Slack Distribution Slack optimization achieves gradual slack distribution.

(35/25) Processor Error Rate and Power Designs with comparable error rates have much higher power/area overheads.

(36/25) Reliability/Power Tradeoff Slack-optimized design enjoys continued power reduction as error rate increases.

(37/25) Enhancing Razor-based Design Slack optimization extends range of voltage scaling and reduces Razor recovery cost.

(38/25) Moore’s Law Power consumption of processor node doubles every 18 months.

(39/25) Power Scaling With current design techniques, processor power soon on par with nuclear power plant