June 6, 20071 Using Negative Edge Triggered FFs to Reduce Glitching Power in FPGA Circuits Tomasz S. Czajkowski and Stephen D. Brown Department of Electrical.

Slides:

Advertisements

Similar presentations

Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.

Advertisements

1 A latch is a pair of cross-coupled inverters –They can be NAND or NOR gates as shown –Consider their behavior (each step is one gate delay in time) –From.

Address comments to FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1.

10/4-6/05ELEC / Lecture 111 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.

Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.

1/48 ENERGY OPTIMIZATION TECHNIQUES: FPGA GLITCH REDUCTION Patrick Cooke and Elizabeth Graham.

1 CS 151: Digital Design Chapter 5: Sequential Circuits 5-3: Flip-Flops I.

Chapter 6 –Selected Design Topics Part 2 – Propagation Delay and Timing Logic and Computer Design Fundamentals.

Reap What You Sow: Spare Cells for Post-Silicon Metal Fix Kai-hui Chang, Igor L. Markov and Valeria Bertacco ISPD’08, Pages

Improving Placement under the Constant Delay Model Kolja Sulimma 1, Ingmar Neumann 1, Lukas Van Ginneken 2, Wolfgang Kunz 1 1 EE and IT Department University.

A Look at Chapter 4: Circuit Characterization and Performance Estimation Knowing the source of delays in CMOS gates and being able to estimate them efficiently.

Synchronous Digital Design Methodology and Guidelines

Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.

May 14, ISVLSI 09 Algorithms for Estimating Number of Glitches and Dynamic Power in CMOS Circuits with Delay Variations Jins Davis Alexander Vishwani.

Practically Realizing Random Access Scan By Anand Mudlapur ECE Dept. Auburn University.

Enhanced Dual-Transition Probabilistic Power Estimation with Selective Supergate Analysis Fei Huand Vishwani D. Agrawal Department of ECE, Auburn University,

1 Dynamic Power Estimation With Process Variation Modeled as Min–Max Delay Jins Davis Alexander Vishwani D. Agrawal Department of Electrical and Computer.

TH EDA NTHU-CS VLSI/CAD LAB 1 Re-synthesis for Reliability Design Shih-Chieh Chang Department of Computer Science National Tsing Hua University.

ECE C03 Lecture 61 Lecture 6 Delays and Timing in Multilevel Logic Synthesis Prith Banerjee ECE C03 Advanced Digital Design Spring 1998.

1 32-bit parallel load register with clock gating ECE Department, 200 Broun Hall, Auburn University, Auburn, AL 36849, USA Lan Luo ELEC.

EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.

A Probabilistic Method to Determine the Minimum Leakage Vector for Combinational Designs Kanupriya Gulati Nikhil Jayakumar Sunil P. Khatri Department of.

Modern VLSI Design 2e: Chapter 4 Copyright  1998 Prentice Hall PTR Topics n Crosstalk. n Power optimization.

Optimality Study of Logic Synthesis for LUT-Based FPGAs Jason Cong and Kirill Minkovich VLSI CAD Lab Computer Science Department University of California,

Dynamic Power Consumption In Large FPGAs WILLIAM GARCIA, ANDREW MORTELLARO.

Digital Integrated Circuits for Communication

ECE 331 – Digital System Design Power Dissipation and Additional Design Constraints (Lecture #14) The slides included herein were taken from the materials.

ECE 331 – Digital System Design Power Dissipation and Propagation Delay.

EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.

156 / MAPLD 2005 Rollins 1 Reducing Energy in FPGA Multipliers Through Glitch Reduction Nathan Rollins and Michael J. Wirthlin Department of Electrical.

Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.

June 10, Functionally Linear Decomposition and Synthesis of Logic Circuits for FPGAs Tomasz S. Czajkowski and Stephen D. Brown University of Toronto.

Review: CMOS Inverter: Dynamic

Power Reduction for FPGA using Multiple Vdd/Vth

POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS I. Bucur, N. Cupcea, C. Stefanescu, A. Surpateanu Computer Science and Engineering Department, University.

A comparison between different logic synthesis techniques from the digital switching noise viewpoint G. Boselli, V. Ciriani, V. Liberali G. Trucco Dept.

XOR-XNOR gates are investigated in this article, Design Methodologies for High-Performance Noise- Tolerant XOR–XNOR Circuits with Power, Area and Time.

LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.

1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University

A NEW ECO TECHNOLOGY FOR FUNCTIONAL CHANGES AND REMOVING TIMING VIOLATIONS Jui-Hung Hung, Yao-Kai Yeh,Yung-Sheng Tseng and Tsai-Ming Hsieh Dept. of Information.

05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.

A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.

A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design

EEE2243 Digital System Design Chapter 7: Advanced Design Considerations by Muhazam Mustapha, extracted from Intel Training Slides, April 2012.

Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.

4. Combinational Logic Networks Layout Design Methods 4. 2

Han Liu Supervisor: Seok-Bum Ko Electrical & Computer Engineering Department 2010-Feb-2.

Skewed Flip-Flop Transformation for Minimizing Leakage in Sequential Circuits Jun Seomun, Jaehyun Kim, Youngsoo Shin Dept. of Electrical Engineering, KAIST,

Topics Combinational network delay.

Introduction to Clock Tree Synthesis

March 28, Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D.

FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.

In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.

Static Timing Analysis

FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

Fast Synthesis of Clock Gating from Existing Logic Aaron P. Hurst Univ. of California, Berkeley Portions In Collaboration with… Artur Quiring and Andreas.

Xiao Patrick Dong Supervisor: Guy Lemieux. Goal: Reduce critical path  shorter period Decrease dynamic power 2.

COE 360 Principles of VLSI Design Delay. 2 Definitions.

A New Logic Synthesis, ExorBDS

Jason Cong, David Zhigang Pan & Prasanna V. Srinivas

Two-phase Latch based design

Timing Analysis 11/21/2018.

FPGA Glitch Power Analysis and Reduction

Hazard-free Karnaugh Map Minimisation

On the Improvement of Statistical Timing Analysis

Off-path Leakage Power Aware Routing for SRAM-based FPGAs

Hazard-free Karnaugh Map Minimisation

Measuring the Gap between FPGAs and ASICs

Power Estimation Dr. Elwin Chandra Monie.

Jason Cong, David Zhigang Pan & Prasanna V. Srinivas

Presentation transcript:

June 6, Using Negative Edge Triggered FFs to Reduce Glitching Power in FPGA Circuits Tomasz S. Czajkowski and Stephen D. Brown Department of Electrical and Computer Engineering University of Toronto, Ontario, Canada

2 Motivation  Glitches: Undesirable logic transitions that occur due to delay imbalance in the logic circuit Waste power and do not provide any useful functionality Can increase the average toggle rate of a net by as much as a factor of 2 Not well defined until post placement and routing  Glitches can be filtered out by strategically inserting negative edge triggered FFs

3 Glitches in FPGAs  Due to unequal arrival time of signals at the inputs of LUTs  Glitches can be propagated through LUTs 4LUT Generated Propagated

4 Reducing Glitches  Insert a negative edge triggered FF after a LUT that produces or propagates glitches 4LUT Generated clock No glitches

5 Alternatives  Gated D-latch Implement a gated D-latch in a LUT Input signal is transparent during the latter half of the clock period  Gated LUT Gate the output of a LUT with the clock input using an AND or an OR gate Similar effect as gated D-latch Can generate glitches too  When implemented Gated D-latch consumes 50% more power than a FF and double that of a gated LUT Neither alternative is very effective

6 Background on Dynamic Power  Average Net Dynamic Power Dissipation P avg is average power V is supply voltage f clock is the clock frequency s i is the average per cycle toggle rate of a net C i is the capacitance of a net

7 Power Model  Goal To be able to compute the change in dynamic power dissipation in the logic elements affected by a negative edge triggered FF insertion  Power dissipated by a LUT and a FF  Toggle Rate of logic signals (s i )  Net capacitance (C i )

8 LUT Power  The LUT itself dissipates an non- trivial amount of power when its inputs toggle  We look at how the power dissipated by a LUT relates to the frequency of its output transitions

9 LUT Power Model

10 FF Power  How much power would it cost to insert a FF into a circuit?  What about the power cost of alternatives to a FFs? Gated LUT Gated D-latch

11 Clocked Element Power Comparison

12 Toggle Rate of Logic Signals  Topic is covered considerably in literature  Toggle rate model based on the concept of Transition Density [Najm’94] and the work of Anderson and Najm [AN’03] The latter work decomposes transition density into transitions generated by a LUT and that propagated through a LUT.  Modified to include delay information in order to account for glitches

13 Examples of Wires P[y]P t (y) P[y’=1 | y=0] P[y’=0 | y=1] D(y) D(y) – P t (y) ½11110 ½½≈0.4 ½0 1/8¼ 1¼0 ¼ 1½¼ Clock A B C D

14 Wire Properties NameDescriptionNotation Static Probability Probability that a wire assumes the logic value 1 in any given clock cycle. P[y] Transition Probability The average number of state transitions, excluding glitches. P t (y) Low to High Transition Probability Probability that a wire will change state to logic value 1, given that it is at a logic value 0 at present. P[y’=1 | y=0] High to Low Transition Probability Probability that a wire will change state to logic value 0, given that it is at a logic value 1 at present. P[y’=0 | y=1] Transition Density The average number of logic value transitions per cycle. Includes glitches. D(y) Average Number of Glitches per cycle The average number of useless transitions per clock cycle D(y)-P t (y)

15 Propagating Glitches Through a LUT  Increase D(z) to account for glitches that occur on wire y (D(y)-P t (y)). Do so only when x remains at constant 1 for the duration of the clock cycle. y x z

16 Estimate Error

17 Net Capacitance  We need to be able to estimate net capacitance to figure out the difference in dynamic power dissipation due to a change in the transition density of a net  Relate net capacitance (unavailable directly) to net delay (available through timing report) Distinguish between nets of different fanout

18 Fanout 1 Net Capacitance

19 Fanout 2 Net Capacitance

20 Fanout 3 Net Capacitance

21 Fanout 4 Net Capacitance

22 Higher Fanout Net Capacitance  In our benchmark set fewer than 5% of the nets had fanout greater than 4 Clock net is excluded from calculation  Approximate capacitance of net with fanout n>4 as:  Not exact, but supports the fact that glitches on nets with high fanout are bad Average estimate error of +22%

23 Negative Edge Triggered FF Insertion Algorithm 1. Scan all nets in a logic circuit to determine if negative edge FF insertion can be applied 2. Analyze the resulting set of nets to determine the benefit of applying the optimization to each net (determined by the cost function) 3. Apply the optimization to a net on which the most power could be saved 4. Repeat until no beneficial choices are found

24  Compute change in power (∆P) + cost of adding a FF - power saved on the modified net - power saved on nets and LUTs in the transitive fanout of the added FF  Compute the change in the minimum clock period (∆T) Specify ∆T allowed (∆T a )  where u(x) is the step function  Accept change when ∆C < 0 Cost Function

25 Example LUT Some logic network LUT FF

26 Example: Inserted FF LUT Some logic network LUT FF Neg FF

27 Example: Compute change in the # of glitches LUT Some logic network LUT FF Neg FF

28 Example: Compute change in the # of glitches LUT Some logic network LUT FF Neg FF

29 Example: Compute change in LUT power dissipation LUT Some logic network LUT FF Neg FF

30 Experimental Results  8 benchmark circuits taken from QUIP package  Synthesize, place, route and analyze timing of a circuit using Quartus II 5.1  Apply algorithm to reduce glitches in a circuit Aim to decrease the minimum clock period by no more than 5%  Perform timing analysis once the circuit has been modified  Use ModelSIM-Altera 6.0c for simulation Simulate a circuit both pre- and post- modification using the same clock frequency  Use PowerPlay Power analyzer to estimate the average dynamic power dissipation of each circuit

31 Experimental Results Circuit name Simulation Clock Frequency (MHz) Minimum Clock PeriodDynamic Power Dissipation Initial (ns) Final (ns) Change (%) Initial (mW) Final (mW) Change (%) Barrel64* mux64_16bit fip_cordic_rca oc_des_perf_opt oc_video_compression_ systems_huffman_enc cf_fir_24_8_ aes128_fast rsacypher Average

32 Observations (1)  oc_des_perf_opt Large number of XOR gates present Removing glitches from one node removes a lot of glitches on the nodes in its transitive fanout (up to the next FF)  mux64_16bit The cost function determined that no net was a good candidate for optimization Very few glitches were present in the circuit and the power they dissipate was not large enough to warrant the insertion of FFs

33 Observations (2)  cf_fir_24_8_8 Overestimated toggle rate caused the algorithm to apply negative edge triggered FF insertion too excessively Need to include spatial correlation in the toggle rate model  aes128_fast Toggle rate is 50% higher than in oc_des_perf_opt Most nets use local LAB connections, causing little power dissipation Insertion of 173 FFs only achieved 1% power reduction  Saved mW in routing alone, because toggle rate on all affected wires was reduced by 50-70%  Added 24.6 mW due to FF insertion  Added 1.86 mW to the power dissipated by the clock network, because new LABs were connected to the clock network  Net win of 8.68 mW

34 Conclusion  Negative edge triggered FF insertion can work well to reduce glitches in a circuit Computing glitches propagated to the transitive fanout of a net is important, especially when XOR gates are present When inserting a lot of negative edge triggered FFs, be mindful where they go. Do target LABs have a clock signal already routed to them?  Unlike retiming, our approach only needs to ensure that exactly one negative edge triggered FF is on any given combinational path Retiming may require the translation of more than a single FF to be valid

35 Future Work  Better toggle rate prediction algorithm that includes spatial correlation  Having FFs that can be negative edge triggered without using an additional LAB clock line would make the cost of this optimization lower Silicon area cost vs. frequency of use trade-off

36 Acknowledgement  We’d like to express our gratitude to Altera for funding this research  We’d like to thank Altera Toronto in particular for dedicating some of their time to answer our questions and provide insight throughout the course of this work

June 6, Questions?