1/48 ENERGY OPTIMIZATION TECHNIQUES: FPGA GLITCH REDUCTION Patrick Cooke and Elizabeth Graham.

Slides:



Advertisements
Similar presentations
ECE C03 Lecture 71 Lecture 7 Delays and Timing in Multilevel Logic Synthesis Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Advertisements

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
ECE 3110: Introduction to Digital Systems
June 6, Using Negative Edge Triggered FFs to Reduce Glitching Power in FPGA Circuits Tomasz S. Czajkowski and Stephen D. Brown Department of Electrical.
Power Reduction Techniques For Microprocessor Systems
Synchronous Digital Design Methodology and Guidelines
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
1 Closed-Loop Modeling of Power and Temperature Profiles of FPGAs Kanupriya Gulati Sunil P. Khatri Peng Li Department of ECE, Texas A&M University, College.
CMOS Circuit Design for Minimum Dynamic Power and Highest Speed Tezaswi Raja, Dept. of ECE, Rutgers University Vishwani D. Agrawal, Dept. of ECE, Auburn.
Contemporary Logic Design Multi-Level Logic © R.H. Katz Transparency No Chapter # 3: Multi-Level Combinational Logic 3.3 and Time Response.
Aug 23, ‘021Low-Power Design Minimum Dynamic Power Design of CMOS Circuits by Linear Program Using Reduced Constraint Set Vishwani D. Agrawal Agere Systems,
Design of Variable Input Delay Gates for Low Dynamic Power Circuits
Aug 31, '02VDAT'02: Low-Power Design1 Minimum Dynamic Power Design of CMOS Circuits by Linear Program Using Reduced Constraint Set Tezaswi Raja, Rutgers.
August 12, 2005Uppalapati et al.: VDAT'051 Glitch-Free Design of Low Power ASICs Using Customized Resistive Feedthrough Cells 9th VLSI Design & Test Symposium.
Dec. 6, 2005ELEC Glitch Power1 Low power design: Insert delays to eliminate glitches Yijing Chen Dec.6, 2005 Auburn university.
9/20/05ELEC / Lecture 81 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
May 28, 2003Minimum Dynamic Power CMOS1 Minimum Dynamic Power CMOS Circuits Vishwani D. Agrawal Rutgers University, Dept. of ECE Piscataway, NJ 08854
ECE C03 Lecture 61 Lecture 6 Delays and Timing in Multilevel Logic Synthesis Prith Banerjee ECE C03 Advanced Digital Design Spring 1998.
A Probabilistic Method to Determine the Minimum Leakage Vector for Combinational Designs Kanupriya Gulati Nikhil Jayakumar Sunil P. Khatri Department of.
Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.
9/27/05ELEC / Lecture 91 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Dynamic Power Consumption In Large FPGAs WILLIAM GARCIA, ANDREW MORTELLARO.
Low power architecture and HDL coding practices for on-board hardware applications Kaushal D. Buch ASIC Engineer, eInfochips Ltd., Ahmedabad, India
Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
156 / MAPLD 2005 Rollins 1 Reducing Energy in FPGA Multipliers Through Glitch Reduction Nathan Rollins and Michael J. Wirthlin Department of Electrical.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.
Power Reduction for FPGA using Multiple Vdd/Vth
POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS I. Bucur, N. Cupcea, C. Stefanescu, A. Surpateanu Computer Science and Engineering Department, University.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
1 Digital Design: Time Behavior of Combinational Networks Credits : Slides adapted from: J.F. Wakerly, Digital Design, 4/e, Prentice Hall, 2006 C.H. Roth,
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Optimal digital circuit design Mohammad Sharifkhani.
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
EEE2243 Digital System Design Chapter 7: Advanced Design Considerations by Muhazam Mustapha, extracted from Intel Training Slides, April 2012.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
FPGA CAD 10-MAR-2003.
March 28, Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
IPR: In-Place Reconfiguration for FPGA Fault Tolerance Zhe Feng 1, Yu Hu 1, Lei He 1 and Rupak Majumdar 2 1 Electrical Engineering Department 2 Computer.
04/21/20031 ECE 551: Digital System Design & Synthesis Lecture Set : Functional & Timing Verification 10.2: Faults & Testing.
Xiao Patrick Dong Supervisor: Guy Lemieux. Goal: Reduce critical path  shorter period Decrease dynamic power 2.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept., UCLA 2.
MAPLD 2005 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan Dr. V. Kamakoti.
Mapping into LUT Structures
ECE 434 Advanced Digital System L03
Timing Analysis 11/21/2018.
IAY 0800 Digitaalsüsteemide disain
An Active Glitch Elimination Technique for FPGAs
FPGA Glitch Power Analysis and Reduction
Off-path Leakage Power Aware Routing for SRAM-based FPGAs
Measuring the Gap between FPGAs and ASICs
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
A Random Access Scan Architecture to Reduce Hardware Overhead
Presentation transcript:

1/48 ENERGY OPTIMIZATION TECHNIQUES: FPGA GLITCH REDUCTION Patrick Cooke and Elizabeth Graham

2/48 Field-Programmable Gate Arrays  Used to implement digital systems  Pros Flexible Low time-to-market  Cons Consumes up to 10x more power than equivalent ASIC design Barrier for power- sensitive applications

3/48 FPGA Architecture  Island-style  Logic blocks connected by programmable routing network  Look-up-tables (LUTs)  k-input LUT supports k variable logic functions and requires 2 k configuration bits  Hardware implementation of truth table

4/48 Power in FPGAs  Static power  Current leakage in transistors  Dynamic power  Signal transition between logic-0 and logic-1 Functional transition Necessary for correct operation of circuit Glitch LUT output transition due to unbalanced delays at inputs 4-73% of total dynamic power Average of 22.6%

5/48 Glitch Example Unbalanced Delays Balanced Delays

6/48 Glitch Reduction Techniques  Algorithms that balance delays  Technology mapping stage Mapping based on glitch-aware switching activities  Routing stage Faster arriving inputs delayed by extending path  Architectural level Programmable delay elements  All incur area or performance overhead  Flip-flop insertion/pipelining  Fewer logic levels reduces opportunity for imbalanced delays  Logic manipulation algorithms  Change don’t-care values to reduce glitching

7/48 FPGA GLITCH POWER ANALYSIS AND REDUCTION Warren Shum and Jason H. Anderson University of Toronto Department of Electrical and Computer Engineering Published in ISLPED 2011

8/48 Proposed Solution  Glitch reduction optimization algorithm based on don’t-cares  Selects don’t-care output values of LUTs in such a way that reduces glitching  Performed after placement and routing  Uses timing simulation data for guidance  No area or performance overhead  Inspired by hazard-free logic synthesis techniques for asynchronous circutis

9/48 Don’t-cares  Entries in truth table where output can be set as either logic-0 or logic-1 without affecting correctness of circuit  Two categories  Satisfiability don’t-cares (SDCs) Particular input pattern can never occur on inputs  Observability don’t-cares (ODCs) Output cannot propagate to circuit’s primary outputs SDC ODC

10/48 Dynamic Power Model  Variables  n : number of nets in circuit  S i : switching activity of net i  C i : capacitance of net i  f : frequency of circuit  V dd : supply voltage  Algorithm focuses on switching activity

11/48 Removable Glitch 0

12/48 Don’t-care Analysis  ABC logic synthesis network  Developed at UC Berkeley  Boolean satisfiability (SAT)-based complete don’t-care analysis Determines don’t-care minterms  Utilizes miter circuit to find don’t-cares If C(x) = 0, y is don’t care minterm of LUT f

13/48 Motivational Experiments  Examined amount of glitch power dissipated by 20 MCNC benchmark designs  Experimental setup  Altera Quartus 10.1  65nm Stratix III family  ModelSim 6.3e used for functional and timing simulation  5000 random input vectors  Dynamic power computed using Quartus PowerPlay  Glitch power = dynamic power(timing) – dynamic power(functional)

14/48 Motivational Results  Percentage of dynamic power from glitches  Range: %  Average : 26.0%  Percentage of LUT input states that are don’t-cares  Range: %  Average: 15.1%

15/48 Glitch Reduction Algorithm  Inputs  Placed and routed netlist  Value change dump (VCD) file Results of timing simulation  Algorithm progresses from shallower levels of LUTs to deeper ones  In each level, LUTs examined in descending order of power consumption

16/48 Glitch Reduction Algorithm  For each LUT in netlist  Compute don’t-cares of LUT ABC SAT-based don’t-care analysis  Scan input vectors Voting mechanism Details on next slide  Set values of don’t-cares and update netlist Majority vote decides don’t-care value Netlist updated to guarantee equivalent functionality

17/48 Input Vector Scan  Sequence of local input vectors to LUT extracted from VCD file and examined in order  When don’t-care input vector is reached  Find value of closest care state before and after don’t-care input vector  If these values are identical, vote for that value  Otherwise, no vote is cast  Each don’t-care in LUT has separate tally of votes

18/48 Algorithm Walkthrough 0 0

19/48 Iterative Flow  To verify modified don’t-care values, algorithm iterates until convergence  Placement and routing are not run again  Runtime on order of minutes  No modifications to timing characteristics

20/48 Experimental Study  Same experimental setup as motivational experiments  20 MCNC benchmark circuits  Altera Quartus 10.1  65nm Stratix III family  ModelSim 6.3e  Combinational equivalence checking used to ensure circuit functionality unchanged  Three passes of optimization loop  Negligible change after three passes  Worst-case sets don’t-cares to the opposite value of that obtained by algorithm

21/48 Experimental Results  Dynamic power reduction  Average: 4.0%  Peak : 12.5%  Glitch power reduction  Average: 13.7%  Peak: 49.0%  Optimized vs. worst- case dynamic power reduction  Average: 9.8%  Peak: 30.8%

22/48 Power & Don’t-care Ratio vs. Fanout  Average signal power increases with fanout due to increase in capacitance  Average don’t-care ratio shows decreasing trend with respect to fanout  Signals consuming most power are poor targets for glitch reducing algorithm based on don’t-cares

23/48 Average Vote Bias  Vote bias is percentage of votes that were cast for the more popular setting  For all circuits tested, highly preferable setting existed for all don’t-cares  Suggests don’t-care values can be picked with high degree of confidence

24/48 Conclusion  Future Work  Integrate algorithm into power-aware FPGA CAD flow  Investigate whether other stages of CAD flow could improve algorithm effectiveness  Reduce runtime by integrating algorithm with incremental timing simulation  Shortcomings  Algorithm seems to only address satisfiability don’t-cares (SDC)

25/48 GLITCHLESS: DYNAMIC POWER MINIMIZATION IN FPGAS THROUGH EDGE ALIGNMENT AND GLITCH FILTERING Julien Lamoureux, Guy G. Lemieux, Steven J.E. Wilton University of British Columbia Department of Electrical and Computer Engineering Published in TVLSI 2008

26/48 GlitchLess Overview  Adds programmable delay elements  To align arrival times  Act as filter to eliminate off-chip glitches  Applied after routing  Can be combined with other power-saving methods Original circuit with glitch Glitch removed by delaying input c

27/48 Trade-Offs  Save glitch power  Delay elements  Area overhead (modest increase)  Speed overhead (very minimal since only early- arriving signals are delayed)  Power overhead for driving additional circuit elements

28/48 How Long Can Delays Be?  Actual range varies between benchmarks, but they all have similar shape  Most pulse widths < 10ns

29/48 How Small Can Delays Be?  Longer pulse widths (over 200ps) are the ones that need to be aligned

30/48 Potential Power Savings

31/48 Programmable Delay Elements  Minimum delay  Small: Align edges more precisely  Large: Less overhead  Maximum delay  Small: Less overhead  Large: Able to suppress glitch from longer pulse  Number of delay elements (on input vector)  Small: Less adaptable  Large: More overhead 2

32/48 Programmable Delay Elements Each delay stage has slow and fast mode – Mode controlled by value in SRAM Bypass stages for very small delay Number of stages determined by delay element parameters Stage

33/48 Placement of Delay Elements OriginalScheme 1: LUT Inputs BLE – Lookup Table and Flip-flop pair

34/48 Scheme 1: LUT Inputs  Each input delayed individually  Independently determine delay  Delay element optional for each input  Same minimum and maximum delay for all elements  Overhead increases exponentially with Number of delay elements

35/48 Placement of Delay Elements OriginalScheme 2: Gradual LUT Inputs BLE – Lookup Table and Flip-flop pair

36/48 Scheme 2: Gradual LUT Inputs  Delay elements in same location as Scheme 1  Maximum delay decreases by 50% for each input of an input vector  Works due to variation of input arrival times  Reduces area overhead for large Number of delay elements without loss of effectiveness

37/48 Placement of Delay Elements OriginalScheme 3: LUT Inputs + Outputs BLE – Lookup Table and Flip-flop pair

38/48 Scheme 3: LUT Inputs + Outputs  Scheme 1, add delay elements to BLE output  Output delay elements ignore parameter for Number of delay elements  1 output delay element eliminates multiple input delay elements  Reduces overhead

39/48 Placement of Delay Elements OriginalScheme 4: CLB and LUT Inputs BLE – Lookup Table and Flip-flop pair

40/48 Scheme 4: CLB and LUT Inputs  Same concept from Scheme 3  Delay elements closer to CLB input (than to output of LUT)  Every CLB input has a delay element

41/48 Placement of Delay Elements OriginalScheme 5: LUT Inputs + Bank BLE – Lookup Table and Flip-flop pair

42/48 Scheme 5: LUT Inputs + Bank  Scheme 1, add bank of delay elements  Any signal can use bank  Reduce number, size of input delay elements  Long delays use bank  Short delays use small input delay elements  Minimum bank delay = maximum input delay

43/48 Experimental Setup  Area, power, and delay estimations  VPR (Versatile Place and Route) simulations  Models original FPGA circuit  Inertial Delay Model  HSPICE simulations  Models delay elements  10 largest benchmarks each from MCNC, ISCAS89 benchmark suites  Manually set delay element parameters

44/48 Delay Element Overhead

45/48 Select Results Table 10: Overall power savings. (Abbreviated)

46/48 Conclusions and Future Work  Scheme 1 saves 18.2% of power  Scheme 2 saves 16.8% with less area and power overhead  Investigate newer technology  Tend to have higher leakage power  Circuit-level implementation  Reduce area overhead, increased PVT tolerance

47/48 Shortcomings  No physical experiments (all simulation-based)  Misuse of data cited from another paper  “dynamic power still accounts for 62% of total power” Tuan, Tim, et al. "A 90nm low-power FPGA for battery-powered applications."Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays. ACM, 2006.

48/48 QUESTIONS?