Download presentation
1
Power Reduction for FPGA using Multiple Vdd/Vth
Cecille Freeman Monday April 3, 2006
2
References Fei Li; Yan Lin; Lei He. “Vdd programmability to reduce FPGA interconnect power” in ICCAD International Conference on Computer Aided Design, 2004, p Fei Li; Yan Lin; Lei He. “FPGA power reduction using configurable dual-Vdd” in Proceedings Design Automation Conference, 2004, p Fei Li; Yan Lin; Lei He; Jason Cong. “Low-power FPGA using pre-defined dual-Vdd/dual-Vt fabrics” in ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA, v 12, 2004, p
3
Outline Introduction Pre-defined dual Vdd Configurable Dual Vdd
Dual Vt and dual Vdd structures CAD tool flow Results Configurable Dual Vdd Structure CAD Interconnect Dual Vdd
4
Introduction Power consumption
FPGAs are less power efficient than ASICs Reducing power loss is important if FPGAs are going to be used in embedded systems Previous approaches mostly focus on changing the design implementation This is the first “in-depth study” of dual Vdd/Vt techniques for FPGA This technique is fairly common in ASIC More common in ASIC because each designer has the ability to adjust the circuit as they see fit. In FPGA, the board design is constrained by what is available for from the company.
5
Introduction Power consumption Power loss from switching and leakage
Leakage is dominant in submicron (<100nm) Both leakage and switching are reduced by reducing Vdd Leakage is reduced by increasing Vt Programmable Vdd/Vt – 40-45% power reduction in ASIC ASIC is better because don’t have overhead of programmability
6
Introduction Dynamic Power f=clock frequency
E= Effective transition density C=load capacitance Vdd=supply voltage Switching power is quadratically proportional to supply voltage
7
Introduction Leakage Power Ilkg=leakage current Vdd=supply voltage
Ilkg increases as Vt decreases
8
Introduction Dual Vdd theory
Lower supply power is slower, but results in less power loss Not all paths in the circuit need to be equally fast Critical path has high Vdd for speed Non-critical path has low Vdd for power Makes use of timing slack
9
Predefined Dual Vdd/Vt
Design in 3 stages Determine a good Vdd/Vt scaling from a normal LUT design Dual Vt within each LUT Dual Vdd across the chip
10
Predefined Dual Vdd/Vt
Single Vdd/Vt LUT (normal) SRAM cell, MUX tree SRAM holds the configuration bits MUX is attached to inverse and regular versions of the inputs Bits in SRAM determine which minterms are OR’d
11
Predefined Dual Vdd/Vt
Single Vdd/Vt scaling Scaling across all LUTs Reduction in switching power (quadratic as reduce supply voltage) Large delay penalties as supply is reduced Examined 3 scaling schemes Constant Vt Fixed Vdd/Vt ratio Constant leakage power What is currently done
12
Predefined Dual Vdd/Vt
Scaling Vdd to constant leakage is best
13
Predefined Dual Vdd/Vt
Dual Vt within a single LUT SRAM can have a high Vt because they are configured at the start, and are only read during operation (ie, no switching delay) Increasing Vt increases the time taken to program the FPGA
14
Predefined Dual Vdd/Vt
15
Predefined Dual Vdd/Vt
Vt of SRAM set to get 15X SRAM leakage reduction Increases configuration time by 13% MUX (region II) Vdd set using constant leakage scaling Vdd of SRAM set to be same as MUX (constant in LUT)
16
Predefined Dual Vdd/Vt
High and Low Vdd LUTs Need a level converter Need to determine how the high and low voltage LUTs will be placed on the chip Need a tool to determine What should be in low and what should be in high How the placement and routing should be done
17
Predefined Dual Vdd/Vt
Level Converter Basically 2 inverters with a level restore
18
Predefined Dual Vdd/Vt
FPGA Fabric – 2 choices
19
Predefined Dual Vdd/Vt
CAD tool Assignment of high/low LUTs based on “power sensitivity” LUT that will cause most power reduction when moved to low VDD is changed If timing constraints are met, keep, otherwise change back Routing done using simulated annealing, with extra cost function for matching the high and low LUT assignment
20
Predefined Dual Vdd/Vt
Tested on 20 MCNC benchmarks Dual Vt 11.6% power reduction for combinational 14.6% power reduction for sequential Dual Vdd/Vt 13.6% combinational, 14.1% sequential Not as much as expected – routing and placement issues because predefined Layout Average 75% to low Vdd LUTs No significant difference with fabric layout
21
Configurable Dual Vdd/Vt
Pre-defined did not get good power reduction from dual Vdd because of routing and placement issues Solution: make each LUT able to be either a high or a low Vdd LUT, so don’t have the extra constraint
22
Configurable Dual Vdd/Vt
Configurable LUT Attached by P-MOS transistor to both rails SRAM configuration bits to determine which rail supplies power 3 possible configurations VddL, VddH, Power gated (both off) Configuration bits also determine if output goes through a level converter
23
Configurable Dual Vdd/Vt
24
Configurable Dual Vdd/Vt
Problem: AREA Normally sleep transistors have high Vt, but this means they are larger Instead use normal Vt transistors for switches Normal Vt gives higher leakage Gate boosting When a switch is off, apply gate voltage one vt higher than Vdd at the source Gate boosting is used in Xilinx boards already
25
Configurable Dual Vdd/Vt
Problem: AREA Apply switches with a larger granularity Clusters of 10 Logic blocks for one switch configuration Problem: Leakage from extra SRAM SRAM can have high Vt because not written during operation Vt set so have 15X leakage reduction over normal, increase in configuration time of 13%
26
Configurable Dual Vdd/Vt
FPGA fabric Compared fabric with all programmable to one with VddH, VddL and programmable
27
Configurable Dual Vdd/Vt
CAD tools Same as for predefined, except the matching cost now includes programmable blocks as being able to be assigned as either high or low LUTs in the placement algorithm
28
Configurable Dual Vdd/Vt
Results: Compared to single Vdd FPGAs with Vdd optimized for the same target clock frequency Full supply programmability Logic power reduction of 35.5% Logic block area increased by 24% Partial supply programmability (1/1/3 H/L/P) Logic power reduction of 28.62% Logic block area increased by 14% Logic area increase is not very significant when compared to area of routing
29
Configurable Interconnect
Global interconnect power is very high Becomes more dominant as apply power reduction to logic blocks Solution: make the interconnect programmable as well
30
Configurable Interconnect
Only a small portion of the interconnect is ever being used (avg 11.9% on their tests Would be good to power gate the unused 1 configuration bit VddH, VddL 2 configuration bits VddH, VddL, power gated
31
Configurable Interconnect
Configuration for routing switches and connection to logic block
32
Configurable Interconnect
Power considerations for SRAM Additional SRAM means additional leakage power Only program SRAM once before use Use same high-Vt SRAM as for configurable logic blocks Delay considerations Longer delay though routing switch Bound delay increase to 6% by properly sizing the tri-state buffer
33
Configurable Interconnect
CAD tools Similar to tools as the configurable Vdd/Vt Use only full programmable block fabric No placement and routing constraints
34
Configurable Interconnect
Results One bit configuration (no power gating) = % power reduction Two bit configuration (power gating) = 50.55% power reduction 56.1% reduction to interconnect power Power gating reduces FPGA interconnect power by 32% - many unused routing resources can be gated
35
Summary Using a Dual Vt LUT decreases power by ~13%
Predefined dual Vdd has very little effect on power because of routing Fully programmable Vdd logic cells reduces power by 28.6% Fully configurable Vdd logic cells and interconnects with power gating reduces power by 50.55% Tradeoffs: increase in area, increase in delay, increase in configuration time
36
Future Work Reduction of SRAM cells required for programmability
Design of a good power supply network for the chip
37
Conclusions Excellent power reduction overall
Excellent design if power reduction is a concern – no changes required to the design itself Might introduce some timing issues because of extra delay through chip Might be expensive due to extra area required on the chip
38
Thanks, Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.