March 28, 20071 Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D.

March 28, 20071 Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D. Brown

March 28, 20072 Outline  Motivation  Power Model  Glitch Reduction Algorithm  Results  Conclusion

March 28, 20073 Motivation  Glitches: Undesirable logic transitions that occur due to delay imbalance in the logic circuit Waste power and do not provide any useful functionality Can increase the average toggle rate of a net by as much as a factor of 2  Glitches can be filtered out by strategically inserting negative edge triggered FFs

March 28, 20074 Glitches in FPGAs  Due to unequal arrival time of signals at the inputs of LUTs  Glitches can be propagated through LUTs 4LUT Generated Propagated

March 28, 20075 Reducing Glitches  Insert a negative edge triggered FF after a LUT that produces or propagates glitches 4LUT Generated clock No glitches

March 28, 20076 Alternatives  Gated D-latch Implement a gated D-latch in a LUT Input signal is transparent during the latter half of the clock period  Gated LUT Gate the output of a LUT with the clock input using an AND or an OR gate Similar effect as gated D-latch Can generate glitches too  When implemented Gated D-latch consumes 50% more power than a FF and double that of a gated LUT Neither alternative is very effective

March 28, 20077 Background on Dynamic Power  Average Net Dynamic Power Dissipation P avg is average power V is supply voltage f clock is the clock frequency s i is the average per cycle toggle rate of a net C i is the capacitance of a net

March 28, 20078 Power Model  Goal To be able to compute the change in dynamic power dissipation in the logic elements affected by a negative edge triggered FF insertion  Power dissipated by a LUT and a FF  Toggle Rate of logic signals (s i )  Net capacitance (C i )

March 28, 20079 LUT Power  The LUT itself dissipates an non- trivial amount of power when its inputs toggle  We look at how the power dissipated by a LUT relates to the frequency of its output transitions

March 28, 200710 LUT Power Model

March 28, 200711 FF Power  How much power would it cost to insert a FF into a circuit?  What about the power cost of alternatives to a FFs? Gated LUT Gated D-latch

March 28, 200712 Clocked Element Power Comparison

March 28, 200713 Wire Properties NameDescriptionNotation Static Probability Probability that a wire assumes the logic value 1 in any given clock cycle. P[y] Transition Probability The average number of state transitions, excluding glitches. P t (y) Low to High Transition Probability Probability that a wire will change state to logic value 1, given that it is at a logic value 0 at present. P[y’=1 | y=0] High to Low Transition Probability Probability that a wire will change state to logic value 0, given that it is at a logic value 1 at present. P[y’=0 | y=1] Transition Density The average number of logic value transitions per cycle. Includes glitches. D(y) Average Number of Glitches per cycle The average number of useless transitions per clock cycle D(y)-P t (y)

March 28, 200714 Examples of Wires P[y]P t (y) P[y’=1 | y=0] P[y’=0 | y=1] D(y) D(y) – P t (y) ½11110 ½½≈0.4 ½0 1/8¼ 1¼0 ¼ 1½¼ Clock A B C D

March 28, 200715 Example 1 x1x1 x2x2 y NameP[y]P t (y) P[y’=1 | y=0] P[y’=0 | y=1] D(y) x1x1 ½½½½½ x2x2 ½½½½½ 1 0 1 2 Initial state x 1 x 2 Final state x’ 1 x’ 2 # Transitions on y (Trans(x 1 x 2,x’ 1 x’ 2 )) 00 0 010 100 111 01 000 010 102 111 10 000 010 100 111 001 011 101 110

March 28, 200716 Static Probability  Let y = f(x 1,x 2 )=x 1 ∙x 2

March 28, 200717 Probability of a specific Transition  Compute the probability of a specific transition by using the static probability, 1 → 0 and 0 → 1 transition probability of each wire

March 28, 200718 Transition Probability Initial state x 1 x 2 Final state x’ 1 x’ 2 # Transitions on y (Trans(x 1 x 2,x’ 1 x’ 2 )) 00 0 010 100 111 01 000 010 102 111 10 000 010 100 111 001 011 101 110

March 28, 200719 Transition Density Initial state x 1 x 2 Final state x’ 1 x’ 2 # Transitions on y (Trans(x 1 x 2,x’ 1 x’ 2 )) 00 0 010 100 111 01 000 010 102 111 10 000 010 100 111 001 011 101 110

March 28, 200720 0→1 Transition Probability Initial state x 1 x 2 Final state x’ 1 x’ 2 # Transitions on y (Trans(x 1 x 2,x’ 1 x’ 2 )) 00 0 010 100 111 01 000 010 102 111 10 000 010 100 111 001 011 101 110

March 28, 200721 1→0 Transition Probability Initial state x 1 x 2 Final state x’ 1 x’ 2 # Transitions on y (Trans(x 1 x 2,x’ 1 x’ 2 )) 00 0 010 100 111 01 000 010 102 111 10 000 010 100 111 001 011 101 110

March 28, 200722 Properties of wire y in Example 1 NameP[y]P t (y) P[y’=1 | y=0] P[y’=0 | y=1] D(y) y¼3/8¼¾½ x1x1 x2x2 y 1 0 1 2

March 28, 200723 Example 2 NameP[y]P t (y) P[y’=1 | y=0] P[y’=0 | y=1] D(y) x3x3 ½½½½½ y¼3/8¼¾½ x1x1 x2x2 y 1 0 1 2 x3x3 z 3 1 4

March 28, 200724 Computing Properties of wire z  Same computations as in Example 1.  Increase D(z) to account for glitches that occur on wire y (D glitch (z)). Do so only when x 3 remains at constant 1 for the duration of the clock cycle.

March 28, 200725 Minimum Pulse Width  When using the table to compute # of transition on a wire given initial and final state of LUT inputs we can compute intermediate transitions and their duration  Some intermediate pulses will be too short to cause a full logic change at the logic output  This parameter depends on the target device used  We remove those pulses from computation Any pulse with duration less than.25ns is removed

March 28, 200726 Estimate Error

March 28, 200727 Particular Example: mux64_16bit

March 28, 200728 Particular Example: des_perf_opt

March 28, 200729 Particular Example: cf_fir_24_8_8

March 28, 200730 Particular Example: huffman

March 28, 200731 Net Capacitance  We need to be able to estimate net capacitance to figure out the difference in dynamic power dissipation due to a change in the transition density of a net  Relate net capacitance (unavailable directly) to net delay (available through timing report) Distinguish between nets of different fanout

March 28, 200732 Fanout 1 Net Capacitance

March 28, 200736 Higher Fanout Net Capacitance  In our benchmark set fewer than 5% of the nets had fanout greater than 4 Clock net is excluded from calculation  Approximate capacitance of net with fanout n>4 as:  Not exact, but supports the fact that glitches on nets with high fanout are bad Average estimate error of +22%

March 28, 200737 Algorithm 1. Scan all nets in a logic circuit to determine if negative edge FF insertion can be applied 2. Analyze the resulting set of nets to determine the benefit of applying the optimization to each net (determined by the cost function) 3. Apply the optimization to a net on which the most power could be saved 4. Repeat until no beneficial choices are found

March 28, 200738  Compute change in power (∆P) + cost of adding a FF - power saved on the modified net - power saved on nets and LUTs in the transitive fanout of the added FF  Compute the change in the minimum clock period (∆T) Specify ∆T allowed (∆T a )  where u(x) is the step function  Accept change when ∆C < 0 Cost Function

March 28, 200739 Example LUT Some logic network LUT FF

March 28, 200740 Example: Inserted FF LUT Some logic network LUT FF Neg FF

March 28, 200741 Example: Compute change in the # of glitches LUT Some logic network LUT FF Neg FF

March 28, 200742 Example: Compute change in the # of glitches LUT Some logic network LUT FF Neg FF

March 28, 200743 Example: Compute change in LUT power dissipation LUT Some logic network LUT FF Neg FF

March 28, 200744 Experimental Results  8 benchmark circuits taken from QUIP package  Synthesize, place, route and analyze timing of a circuit using Quartus II 5.1  Apply algorithm to reduce glitches in a circuit Aim to decrease the minimum clock period by no more than 5%  Perform timing analysis once the circuit has been modified  Use ModelSIM-Altera 6.0c for simulation Simulate a circuit both pre- and post- modification using the same clock frequency  Use PowerPlay Power analyzer to estimate the average dynamic power dissipation of each circuit

March 28, 200745 Experimental Results Circuit name Simulation Clock Frequency (MHz) Minimum Clock PeriodDynamic Power Dissipation Initial (ns) Final (ns) Change (%) Initial (mW) Final (mW) Change (%) Barrel64*2004.3864.8068.74229.94189.7-17.50 mux64_16bit2753.052 0389.24 0.00 fip_cordic_rca1257.5517.8513.8243.2839.49-8.76 oc_des_perf_opt2902.9893.072.641058.8796.7-24.75 oc_video_compression_ systems_huffman_enc 2603.626 094.8895.190.33 cf_fir_24_8_81705.3755.715.87290.41292.90.84 aes128_fast1406.2516.5694.84879.24870.6-0.99 rsacypher1406.3766.5632.8550.7348.22-4.95 Average +3.6-7.0

March 28, 200746 Observations (1)  oc_des_perf_opt Large number of XOR gates present Removing glitches from one node removes a lot of glitches on the nodes in its transitive fanout (up to the next FF)  mux64_16bit The cost function determined that no net was a good candidate for optimization Very few glitches were present in the circuit and the power they dissipate was not large enough to warrant the insertion of FFs

March 28, 200747 Observations (2)  cf_fir_24_8_8 Overestimated toggle rate caused the algorithm to apply negative edge triggered FF insertion too excessively Need to include spatial correlation in the toggle rate model  aes128_fast Toggle rate is 50% higher than in oc_des_perf_opt Most nets use local LAB connections, causing little power dissipation Insertion of 173 FFs only achieved 1% power reduction  Saved 35.14 mW in routing alone, because toggle rate on all affected wires was reduced by 50-70%  Added 24.6 mW due to FF insertion  Added 1.86 mW to the power dissipated by the clock network, because new LABs were connected to the clock network  Net win of 8.68 mW

March 28, 200748 Conclusion  Negative edge triggered FF insertion can work well to reduce glitches in a circuit  Unlike retiming, our approach only needs to ensure that exactly one negative edge triggered FF is on any given combinational path Retiming may require the translation of more than a single FF to be valid

March 28, 200749 Future Work  Better toggle rate prediction algorithm that includes spatial correlation  Having FFs that can be negative edge triggered without using an additional LAB clock line would make the cost of this optimization lower Silicon area cost vs. frequency of use trade-off

March 28, 200750 Acknowledgement  We’d like to express our gratitude to Altera for funding this research  We’d like to thank Altera Toronto in particular for dedicating some of their time to answer our questions and provide insight throughout the course of this work

March 28, 200751 Questions?

March 28, 20071 Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D.

Similar presentations

Presentation on theme: "March 28, 20071 Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

March 28, 20071 Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D.

Similar presentations

Presentation on theme: "March 28, 20071 Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D."— Presentation transcript:

Similar presentations

About project

Feedback