Designing a Processor from the Ground Up to Allow Voltage/Reliability Tradeoffs Andrew Kahng (UCSD) Seokhyeong Kang (UCSD) Rakesh Kumar (Illinois) John.

Slides:



Advertisements
Similar presentations
(1/25) UCSD VLSI CAD Laboratory - ISQED10, March. 23, 2010 Toward Effective Utilization of Timing Exceptions in Design Optimization Kwangok Jeong, Andrew.
Advertisements

Tunable Sensors for Process-Aware Voltage Scaling
Thank you for your introduction.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Timing Margin Recovery With Flexible Flip-Flop Timing Model
Xing Wei, Wai-Chung Tang, Yu-Liang Wu Department of Computer Science and Engineering The Chinese University of HongKong
Minimum Implant Area-Aware Gate Sizing and Placement
NATW 2008 Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, R. Iris Bahar Division of Engineering Brown University Providence,
The Cost of Fixing Hold Time Violations in Sub-threshold Circuits Yanqing Zhang, Benton Calhoun University of Virginia Motivation and Background Power.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010 Slack Redistribution for Graceful Degradation Under Voltage Overscaling Andrew B.
-1- UCSD VLSI CAD Laboratory and UIUC PASSAT Group Recovery-Driven Design: A Power Minimization Methodology for Error-Tolerant Processor Modules Andrew.
Power-Aware Placement
A Timing-Driven Soft-Macro Resynthesis Method in Interaction with Chip Floorplanning Hsiao-Pin Su 1 2 Allen C.-H. Wu 1 Youn-Long Lin 1 1 Department of.
Architectural-Level Prediction of Interconnect Wirelength and Fanout Kwangok Jeong, Andrew B. Kahng and Kambiz Samadi UCSD VLSI CAD Laboratory
Supply Voltage Degradation Aware Analytical Placement Andrew B. Kahng, Bao Liu and Qinke Wang UCSD CSE Department {abk, bliu,
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Toward Performance-Driven Reduction of the Cost of RET-Based Lithography Control Dennis Sylvester Jie Yang (Univ. of Michigan,
1 Application Specific Integrated Circuits. 2 What is an ASIC? An application-specific integrated circuit (ASIC) is an integrated circuit (IC) customized.
A Cost-Driven Lithographic Correction Methodology Based on Off-the-Shelf Sizing Tools.
1 adaptive body bias for reducing process variations nuno alves 19 / october / 2006.
Layout-based Logic Decomposition for Timing Optimization Yun-Yin Lien* Youn-Long Lin Department of Computer Science, National Tsing Hua University, Hsin-Chu,
Selective Gate-Length Biasing for Cost-Effective Runtime Leakage Control Puneet Gupta 1 Andrew B. Kahng 1 Puneet Sharma 1 Dennis Sylvester 2 1 ECE Department,
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
1 Reconfigurable ECO Cells for Timing Closure and IR Drop Minimization TingTing Hwang Tsing Hua University, Hsin-Chu.
Enhanced Metamodeling Techniques for High-Dimensional IC Design Estimation Problems Andrew B. Kahng, Bill Lin and Siddhartha Nath VLSI CAD LABORATORY,
Soner Yaldiz, Alper Demir, Serdar Tasiran Koç University, Istanbul, Turkey Paolo Ienne, Yusuf Leblebici Swiss Federal Institute of Technology (EPFL), Lausanne,
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
Accuracy-Configurable Adder for Approximate Arithmetic Designs
ICCAD 2003 Algorithm for Achieving Minimum Energy Consumption in CMOS Circuits Using Multiple Supply and Threshold Voltages at the Module Level Yuvraj.
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
Power Reduction for FPGA using Multiple Vdd/Vth
Baoxian Zhao Hakan Aydin Dakai Zhu Computer Science Department Computer Science Department George Mason University University of Texas at San Antonio DAC.
UC San Diego / VLSI CAD Laboratory Toward Quantifying the IC Design Value of Interconnect Technology Improvement Tuck-Boon Chan, Andrew B. Kahng, Jiajia.
Design Verification An Overview. Powerful HDL Verification Solutions for the Industry’s Highest Density Devices  What is driving the FPGA Verification.
-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.
George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks.
Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang
Safe Overclocking Safe Overclocking of Tightly Coupled CGRAs and Processor Arrays using Razor © 2012 Guy Lemieux Alex Brant, Ameer Abdelhadi, Douglas Sim,
OBTAINING QUALITY MILL PERFORMANCE Dan Miller
XIAOYU HU AANCHAL GUPTA Multi Threshold Technique for High Speed and Low Power Consumption CMOS Circuits.
MicroFix: Exploiting Path-grained Timing Adaptability for Improving Power-Performance Efficiency Guihai Yan, Yinhe Han, Hui Liu, Xiaoyao Liang, Xiaowei.
Skewed Flip-Flop Transformation for Minimizing Leakage in Sequential Circuits Jun Seomun, Jaehyun Kim, Youngsoo Shin Dept. of Electrical Engineering, KAIST,
Outline Introduction: BTI Aging and AVS Signoff Problem
-1- Statistical Analysis and Modeling for Error Composition in Approximate Computation Circuits Wei-Ting Jonas Chan 1, Andrew B. Kahng 1, Seokhyeong.
Improving Voltage Assignment by Outlier Detection and Incremental Placement Huaizhi Wu* and Martin D.F. Wong** * Atoptech, Inc. ** University of Illinois.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Explicit Modeling of Control and Data for Improved NoC Router Estimation Andrew B. Kahng +*, Bill Lin * and Siddhartha Nath + UCSD CSE + and ECE * Departments.
French 207 MAPLD 2005 Slide 1 Integrated Tool Suite for Post Synthesis FPGA Power Consumption Analysis Matthew French, Li Wang University of Southern California,
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
UC San Diego / VLSI CAD Laboratory Learning-Based Approximation of Interconnect Delay and Slew Modeling in Signoff Timing Tools Andrew B. Kahng, Seokhyeong.
Mixed Cell-Height Implementation for Improved Design Quality in Advanced Nodes Sorin Dobre +, Andrew B. Kahng * and Jiajia Li * * UC San Diego VLSI CAD.
-1- UC San Diego / VLSI CAD Laboratory On Potential Design Impacts of Electromigration Awareness Andrew B. Kahng, Siddhartha Nath and Tajana S. Rosing.
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
Power-Optimal Pipelining in Deep Submicron Technology
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Cristian Ferent and Alex Doboli
High-Level Synthesis for Side-Channel Defense
Circuit Design Techniques for Low Power DSPs
FPGA Glitch Power Analysis and Reduction
Post-Silicon Calibration for Large-Volume Products
Low Power Digital Design
A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P
Measuring the Gap between FPGAs and ASICs
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Presentation transcript:

Designing a Processor from the Ground Up to Allow Voltage/Reliability Tradeoffs Andrew Kahng (UCSD) Seokhyeong Kang (UCSD) Rakesh Kumar (Illinois) John Sartori (Illinois)

Timing Errors Power is a first-order design constraint Voltage scaling can significantly reduce power Voltage scaling may result in timing errors Operating Voltage

Research Questions How does a conventional processor behave when we fix frequency and scale down voltage? How can we reduce the voltage at which timing errors are observed? Reduce power while maintaining the same performance level

Limitation of Voltage Scaling At some voltage, circuit breaks down Voltage scaling must halt after only 10% scaling.

Limitation of Voltage Scaling Error Rate (%) Module1.0V0.9V0.8V0.7V0.6V0.5V lsu_dctl lsu_qctl lsu_stb_ctl sparc_exu_div sparc_exu_ecl sparc_ifu_dec sparc_ifu_errdp sparc_ifu_fcl spu_ctl tlu_mmu_ctl What problems are caused by steep error degradation?

Problems with Steep Error Degradation Voltage scaling limited in traditional designs.

No power savings as error rate increases Traditional design  No reliability/power tradeoff Problems with Steep Error Degradation

Why do circuits fail catastrophically? Reliability/power tradeoffs enabled Allows switching between error tolerance techniques at different voltages/error rates Higher error rate  Lower power Problems with Steep Error Degradation

Reason for Steep Error Degradation Critical paths are bunched up in traditional designs.

Question… How can we change the slack distribution to achieve a graceful failure characteristic?

Power-optimized design: Reclaim excess timing slack Make slack distribution gradual by re-distributing slack between paths. Both gradual failure and low power can be achieved. Slack-optimized design: Optimize critical paths Design Objectives and Insight Optimize frequently exercised critical paths. De-optimize rarely exercised paths.

Positive SlackNegative Slack Slack Re-distribution Example Negative SlackPositive Slack Error Rate = 1%Error Rate = 25%

Proposed Design Flow Voltage Scaling  Path Optimization  Area Reduction Input: RTL description Output: Gradual slack design Objective: Minimize voltage for a given error rate over a range of error rates

Using fixed target results in over-optimization.Iterative optimization avoids unnecessary swaps. Iterative Optimization

Error Rate Forecasting

Functional simulation Cadence NC Verilog – Gate-level simulation Library characterization Cadence SignalStorm – Synopsys Liberty generation for each voltage Slack Optimization C++ with Synopsys PrimeTime interface ECO P&R Cadence SOCEncounter – Placement and Routing Benchmark generation Virtutech Simics – Test vector generation Design-level Methodology

Gradual Slack Distribution Slack optimization achieves gradual slack distribution.

Processor Module Optimization Slack optimized design has lowest power for all error rates.

Processor Error Rate and Power Designs with comparable error rates have much higher power/area overheads.

Reliability/Power Tradeoff Slack-optimized design enjoys continued power reduction as error rate increases.

Enhancing Razor-based Design Slack optimization extends range of voltage scaling and reduces Razor recovery cost.

Summary and Conclusion Showed limitations of traditional processors w.r.t. voltage scaling Traditional designs break down Presented design technique that enables voltage/reliability tradeoffs Optimize frequently exercised critical paths De-optimize rarely-exercised paths Demonstrated significant power benefits of gradual slack design Reduced power 29% for 2% error rate, 27% on average

Bonus Slides

Slack Optimization Techniques Path Optimization and Power Reduction

Extended Voltage Scaling Focus on frequently exercised negative slack paths Reduce error rate while minimizing cell swaps (power overhead) PathTGOpt. Rank A-B-C-D0.221 A-B-D0.152 A-C-D0.053 Rank paths by error rate contribution.Upsize cells in paths to increase slack. SWAP

Power Reduction Downsize cells on rarely exercised paths Reduce leakage power while leaving error rate unaffected SWAP Toggle Rate ≈ 0 Check Path Slack

Error Rate Forecasting Error rate contribution of one FF Error rate of design

Significance of Processor Power Power is a first-order design constraint Voltage scaling can significantly reduce power Voltage Scaling: 50%  Power Reduction: 80%

DVFS Benefit and Cost How effective is voltage scaling when frequency is fixed?

Alternatives – Blueshift Goal: Optimize paths that cause errors to enable more frequency overscaling Techniques: PCT/OSB Uses iterative simulation loop – infeasible for large designs

Alternatives – Tightly Constrained SP&R Goal: Optimize all paths aggressively Technique: Traditional SP&R with aggressive target Some paths will not meet tight constraint, and slack distribution becomes more gradual

Insight Optimize frequently exercised paths at the expense of rarely exercised paths Optimizing frequently exercised paths enables deeper voltage scaling De-optimizing rarely exercised paths keeps power overhead low

Iterative Optimization Flow Scale VoltageOptimize Paths Iterate

A New Processor Design Goal Reshape the slack distribution so processor fails gracefully

Moore’s Law Power consumption of processor node doubles every 18 months.

Power Scaling With current design techniques, processor power soon on par with nuclear power plant

Outline Background and Motivation Insight Power Reduction Techniques Design Flow Results Summary