0 Optimizing Stochastic Circuits for Accuracy-Energy Tradeoffs Armin Alaghi 3, Wei-Ting J. Chan 1, John P. Hayes 3, Andrew B. Kahng 1,2 and Jiajia Li 1.

Slides:



Advertisements
Similar presentations
Tunable Sensors for Process-Aware Voltage Scaling
Advertisements

OCV-Aware Top-Level Clock Tree Optimization
-1- VLSI CAD Laboratory, UC San Diego Post-Routing BEOL Layout Optimization for Improved Time- Dependent Dielectric Breakdown (TDDB) Reliability Tuck-Boon.
Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,
NTHU-CS VLSI/CAD LAB TH EDA De-Shiuan Chiou Da-Cheng Juan Yu-Ting Chen Shih-Chieh Chang Department of CS, National Tsing Hua University, Taiwan Fine-Grained.
Timing Margin Recovery With Flexible Flip-Flop Timing Model
5/4/2006BAE Analog to Digital (A/D) Conversion An overview of A/D techniques.
Chop-SPICE: An Efficient SPICE Simulation Technique For Buffered RC Trees Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Clock Design Adopted from David Harris of Harvey Mudd College.
UC San Diego / VLSI CAD Laboratory NOLO: A No-Loop, Predictive Useful Skew Methodology for Improved Timing in IC Implementation Tuck-Boon Chan, Andrew.
1 Stochastic Event Capture Using Mobile Sensors Subject to a Quality Metric Nabhendra Bisnik, Alhussein A. Abouzeid, and Volkan Isler Rensselaer Polytechnic.
CMOS Circuit Design for Minimum Dynamic Power and Highest Speed Tezaswi Raja, Dept. of ECE, Rutgers University Vishwani D. Agrawal, Dept. of ECE, Auburn.
Aug 23, ‘021Low-Power Design Minimum Dynamic Power Design of CMOS Circuits by Linear Program Using Reduced Constraint Set Vishwani D. Agrawal Agere Systems,
Background: Scan-Based Delay Fault Testing Sequentially apply initialization, launch test vector pairs that differ by 1-bit shift A vector pair induces.
1 A Variation-tolerant Sub- threshold Design Approach Nikhil Jayakumar Sunil P. Khatri. Texas A&M University, College Station, TX.
Power-Aware Placement
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
Low Power Design for Wireless Sensor Networks Aki Happonen.
TH EDA NTHU-CS VLSI/CAD LAB 1 Re-synthesis for Reliability Design Shih-Chieh Chang Department of Computer Science National Tsing Hua University.
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
NTHU-CS VLSI/CAD LAB TH EDA Student : Da-Cheng Juan Advisor : Shih-Chieh Chang Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization.
Embedded Systems Laboratory Informatics Institute Federal University of Rio Grande do Sul Porto Alegre – RS – Brazil SRC TechCon 2005 Portland, Oregon,
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Toward Performance-Driven Reduction of the Cost of RET-Based Lithography Control Dennis Sylvester Jie Yang (Univ. of Michigan,
A Global Minimum Clock Distribution Network Augmentation Algorithm for Guaranteed Clock Skew Yield A. B. Kahng, B. Liu, X. Xu, J. Hu* and G. Venkataraman*
Statistical Gate Delay Calculation with Crosstalk Alignment Consideration Andrew B. Kahng, Bao Liu, Xu Xu UC San Diego
1 paper I design and implementation of the aegis single-chip secure processor using physical random functions, isca’05 nuno alves 28/sep/06.
DDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators Tuck-Boon Chan †, Puneet Gupta §, Andrew B. Kahng †‡ and Liangzhen.
PH4705/ET4305: A/D: Analogue to Digital Conversion
Enhanced Metamodeling Techniques for High-Dimensional IC Design Estimation Problems Andrew B. Kahng, Bill Lin and Siddhartha Nath VLSI CAD LABORATORY,
Dose Map and Placement Co-Optimization for Timing Yield Enhancement and Leakage Power Reduction Kwangok Jeong, Andrew B. Kahng, Chul-Hong Park, Hailong.
Accuracy-Configurable Adder for Approximate Arithmetic Designs
1 VLSI Design SMD154 LOW-POWER DESIGN Magnus Eriksson & Simon Olsson.
-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.
A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.
An Ultra Low Power DLL Design
Power Reduction for FPGA using Multiple Vdd/Vth
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
A New Method For Developing IBIS-AMI Models
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
Chapter 4 Stochastic Modeling Prof. Lei He Electrical Engineering Department University of California, Los Angeles URL: eda.ee.ucla.edu
EEE2243 Digital System Design Chapter 7: Advanced Design Considerations by Muhazam Mustapha, extracted from Intel Training Slides, April 2012.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
Outline Introduction: BTI Aging and AVS Signoff Problem
-1- Statistical Analysis and Modeling for Error Composition in Approximate Computation Circuits Wei-Ting Jonas Chan 1, Andrew B. Kahng 1, Seokhyeong.
Distributed Computation: Circuit Simulation CK Cheng UC San Diego
Introduction to Clock Tree Synthesis
EE201C : Stochastic Modeling of FinFET LER and Circuits Optimization based on Stochastic Modeling Shaodi Wang
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
Digitization at Feed Through R&D (2) Digitizer Performance Evaluation Student: John Odeghe ; SC State, Fermi Lab Intern Supervisor: JinYuan Wu; Fermi Lab.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
High-Speed Stochastic Circuits Using Synchronous Analog Pulses M
Revisiting and Bounding the Benefit From 3D Integration
Circuit Design Techniques for Low Power DSPs
Presentation transcript:

0 Optimizing Stochastic Circuits for Accuracy-Energy Tradeoffs Armin Alaghi 3, Wei-Ting J. Chan 1, John P. Hayes 3, Andrew B. Kahng 1,2 and Jiajia Li 1 UC San Diego, 1 ECE and 2 CSE Depts., 3 University of Michigan, EECS Dept.

1 Outline Background and Previous Work Problem Statement in SC Physical Design Modeling Approach Optimization Approach Conclusions

2 Motivation: Low Power Challenge Low power design is a grand challenge Mobile devices must operate with extremely low power as the performance requirement of applications grow Voltage scaling has slowed down in the recent years Possible solution: to employ new design paradigms to overcome the challenges and achieve the performance improvements 4W mobile platform power requirement 1W SOC power requirement Slow performance improvement due to power limit + slow voltage scaling [source] ITRS

3 New Paradigm: Stochastic Computing (SC) Stochastic computing (SC) is a design paradigm that has gained attention recently due its low power and error tolerance Random bit streams are used to represent operands Complex arithmetic operations implemented by simple logic circuits 4/8 6/8 3/8 Z = X 1 × X 2 3/8 = 4/8  6/8 X1X1 X2X2 Z

4 Error Tolerance, Precision, and Accuracy Inaccurate computation may occur Number to represent: 5/16 Stochastic: Binary: Bit-stream length grows exponentially with precision Redundant representation provides error tolerance Correct = 3/8

5 Area, Computation Efficiency, and Delay Stochastic multiplier Conventional binary multiplier SC: smaller area, longer computation latency, and shorter critical path Critical path

6 Application Context of SC Stochastic representation is similar to analog “pulse-mode” signals, as well as neural signals Stochastic computing circuit performs cheap pre-processing; saves resources Low cost preprocessing between two domains

7 Summary of Advantages/Disadvantages Advantages Low-complexity circuits (allows massive parallelism) Error tolerance Robustness to voltage scaling (explored and improved this work) Disadvantages Long computation time Limited precision Expensive conversion circuits and storage elements

8 Outline Background and Previous Work Problem Statement in SC Physical Design Modeling Approach Optimization Approach Conclusions

9 Challenges, Problems, and Our Contributions Challenges of stochastic computing (SC) design: Current digital design flow does not comprehend the tradeoff between accuracy and power in SC Physical implementation of SC circuits has not been well explored Problems: What is the efficient way to estimate error while exhaustive simulation is not feasible? Given a synthesized SC circuit, what is the physical implementation recipe? Our contributions: We introduce the delay matching problem in SC We reduce the computation error by balancing delay paths We propose a Markov chain model for error estimation

10 Stochastic Computing: Scope of Study Design Metrics Energy Accuracy (new model is proposed in this work) Circuit area Design Parameters Computation latency (N) Frequency Scaling (f) Voltage scaling (V) Netlist Implementation (New optimization is proposed in this work) Metrics covered in this work

11 Outline Background and Previous Work Problem Statement in SC Physical Design Modeling Approach Optimization Approach Conclusions

12 Three scenarios of signal transitions (A) Ideal: stable states of logic values are captured (B) Balanced delay: all the transitions arrive at the same time (C) Unbalanced delay: causing extra errors due to glitches or delayed transitions Balance of Path Delay Matters x1x1 x0x0 z (A) Ideal Correct (B) Balanced Correct (C) Unbalanced Error Sample clock

13 Markov Chain for Error Prediction Markov chain (MC) has been previously used to model sequential SC circuits We augment the states for delay-induced transition errors from the behavior model Errors induced by glitches and delayed transitions Transition probability are trained by a small set of simulation results Stationary probability distribution is obtained by solving the Markov chain C 1, D 1, G 1 decide the output expected values Used for error estimation Only correct states in the previous SC behavior model

14 Result: Markov Chain for Error Prediction Model is accurate for larger errors The model is less accurate when error is small Precise prediction for high error magnitude On-going work: to improve the accuracy for small errors

15 Before our work: SC behavior model is based on pre-layout simulation SC behavior model did not consider the cell delay and wire delay contributed by physical implementation Our work: Augment the SC behavior model by considering delayed transitions and glitches contributed by physical implementation Optimize the physical implementation by balancing the timing paths Outcome of Accuracy Model Study Correct Error Balanced delays

16 Outline Background and Previous Work Problem Statement in SC Physical Design Modeling Approach Optimization Approach Conclusions

17 Clock is fast to compensate for long computation latency Launch and capture flip-flops may be far apart in a huge array of SC circuits Unbalanced paths due to circuit structures and variations  Previous analysis shows delay balance matters The timing is more critical when DVFS lowers the supply voltage Challenges of SC Physical Implementation x1x1 x0x0 z SC sub-circuits faster clock to compensate for long latency Path 1 (long) Path 2 (short) Analog front- end circuit or random number generator Converter to binary number system Long physical distance in a huge array

18 Problem statement: Given an SC circuit and a range of supply voltages, we seek an implementation that minimizes error across the voltages Observation: Transition errors increase at lower voltages due to path delay mismatch Approach: ILP-based retiming after P&R by commercial tool Optimization constraints: #Buffers / #wires inserted to compensate for shorter paths Bounded delay variation across voltages Buffer power penalty Objective: minimize path delay differences Improves accuracy Side note: Similar to multi-corner multi-mode (MCMM) CTS skew optimization: Skew Path delay differences MCMM Delays are evaluated at multiple supply voltages Power penalty #Buffer insertion Post-P&R Optimization for SC Circuits

19 ILP Formulation for Buffer Insertion

20 Heuristics for Buffer Choices Heuristic 1: various buffer/wire types to compensate for delay between voltages We provide buffer candidates with different delay sensitivity to voltage scaling We provide wire detour options to provide wider voltage sensitivity range Heuristic 2: pruning buffers in the candidates to speed up MILP Solutions are pruned within sub-regions in the tradeoff space by choosing cells in the regions with lowest leakage Without pruningWith pruningWire detouring

21 Result: Improved Accuracy by Balancing Paths Path delays Average Errors Lower error Less inter-path delay skew STRAUSS (UMich) + Conventional P&R (ICC) ReSC (UMN) + Conventional P&R (ICC) ReSC (UMN) + Proposed P&R Opt.

22 Result: Improved Input Delay Window Safe timing window: timing margin between clock edge and input delay Before optimization: small input delay variation will cause errors After: Safe timing window = half of the clock cycle Clock period = 150ps Safe window Original delay distribution Opt.

23 Improved accuracy = Less voltage scaling needed = Higher energy efficiency Conventional P&R flow (ICC) fails to meet accuracy constraint when VDD is low Our proposed P&R optimization reduce delay mismatch at lower voltages and leads to lower energy cost for the same accuracy Result: Improved Energy Cost by Balancing Paths

24 The proposed Markov chain model is verified on four different SC application circuits Green: New MC model Blue: Exhaustive simulation MC Model: Improved Simulation Runtime #Cycle (Ex.)#Cycles (MC) GammaCorr PolySmall25610 Neuron10010 Less simulation cycles

25 Testcase: Gamma correction Both SC and conventional circuits are signed off at 1.0V SC still generates recognizable image at 0.6V Energy saving of SC = 66% Result: Gamma Correction

26 Outline Background and Previous Work Problem Statement in SC Physical Design Modeling Approach Optimization Approach Conclusions

27 Conclusions We identify the impact of delay-induced errors and propose a Markov chain-based model for error estimation We propose a new physical implementation approach that improves the energy-accuracy tradeoff The experiment results show significant energy and benefit over previous work Future work Markov chain model improvement Comprehensive tradeoff recipe for performance, accuracy, and energy

28 Thank you !