Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deepa Soman, HyunSuk Nam, Rekha Srinivasaraghavan, Shashank Sivakumar

Similar presentations


Presentation on theme: "Deepa Soman, HyunSuk Nam, Rekha Srinivasaraghavan, Shashank Sivakumar"— Presentation transcript:

1 Deepa Soman, HyunSuk Nam, Rekha Srinivasaraghavan, Shashank Sivakumar
Optimization of Power Reduction in FPGA Interconnect by Charge Recycling Presentation slide for courses, classes, lectures et al. Deepa Soman, HyunSuk Nam, Rekha Srinivasaraghavan, Shashank Sivakumar

2 Agenda Day 1 Day 2 Intro Power Consumption Techniques
Power Reduction Techniques Discussions Day 2 Power Reduction Techniques (Conti) Charge Recycling Our Project Discussions Beginning course details and/or books/materials needed for a class/project.

3 Introduction Motivation Achilles’ Heel
3 A schedule design for optional periods of time/objectives. Introduction Motivation Achilles’ Heel Logic flexibility & re-programmability -longer wires (7-14 X) higher than asics

4 Power Consumption Dynamic Power -  power consumed while the inputs are active Static power - power consumed even when there is no circuit activity !!! Dynamic Power Consumption Affected by Switching activity, Capacitance of transistors, supply voltage and frequency of operation Static Power Consumption Thermal characteristic accompanying Shrinking transistor size

5 Why Panic about Power?

6 Why Static Power??

7 Low Power Opportunities

8 Hardware Techniques Voltage Scaling Dual Vdd Frequency Scaling
Clock Gating

9 Voltage Scaling Selecting core voltage based on performance requirements How to Choose? – From Timing Analysis Types: 1) Static Voltage Scaling 2) Dynamic Voltage Scaling

10 1. Static Voltage Scaling
Selected core voltage only Realized using on chip Low-Dropout regulator(LDO) Voltage controlled by configuration bit stream  0.8-V - minimum dynamic and leakage power 1.0-V - overall highest performance 1.0v 0.8v LDO [1]"A FPGA Prototype Design Emphasis on Low Power Technique" Xu, Jian

11 2. Dynamic Voltage Scaling
Provides different voltage levels Realized using voltage controlling unit Can be level shifter or DC-DC converter DVS implementation (LDMC – Logic Delay Measurement Unit) Delay error a novel Logic Delay Measurement Circuit using FPGA resources: to the first order, the reading produced by the LDMC tracks the critical path delay of a circuit that we wish to operate under DVS; we also show experimentally that by using a closed loop DVS system which keeps the LDMC reading above a threshold, no errors occur;  ”Dynamic Voltage Scaling for Commercial FPGAs”, C.T. Chow1, L.S.M. Tsui1, P.H.W.

12 Dual Supply Voltage (Vdd)
Separate voltage supplies for configuration SRAM and other elements Purpose: To support sleep mode Shutdown most logic except SRAM using LDO “A Dual-VDD Low Power FPGA Architecture” A. Gayasen1, K. Lee1, N. Vijaykrishnan1, M. Kandemir1, M.J. Irwin1, and T. Tuan2

13 Performance Static voltage scaling techniques leads to nearly 53% power reduction. Dynamic(upto 54%). Dual Vdd- 14% Merits: SVS - Simple hardware DVS - Self adaptive Dual Vdd – eliminate speed penalty Demerits: SVS - Voltage is fixed DVS - design complexity Dual Vdd - area overhead [1]"A FPGA Prototype Design Emphasis on Low Power Technique" Xu, Jian [2]”A 90-nm Low-Power FPGA for Battery-Powered Applications”,Tuan, Das, Steve, Sean

14 Frequency Scaling f : frequency of switching Simple dynamic clock
management circuit (b) Using Feedback, PLL circuit can reduce skew; lock time (a) The simplest dynamic clock management circuit is an open-loop implementation with a clock divider inserted into the desired paths (b) Skew can be compensated by introducing a Phase Locked Loop (PLL) into the circuitry. The simplest dynamically scaled structure is obtained by taking feedback from a point that does not change frequency © This scheme can successfully apply dynamic clock division. For dynamic multiplication, the signal in the feedback path must be divided In the case of a large change in input frequency, the output of the PLL may take a long period to settle and regain a lock on the input signal. (c) dynamic clock division Merits: Can subsequently reduce voltage Demerits: Increased Latency Dynamic Clock Management Implementations

15 Benefits of Frequency Scaling
Dynamic Clock Management for Low Power Applications in FPGAs As frequency decreases, power consumption also decreases "Dynamic Clock Management for Low Power Applications in FPGAs", Lan, zilic

16 Clock Gating Controlling the clock flow
Purpose: To temporarily disable blocks Can be realized in hardware using clock enable signals minimizes power dissipation in clock circuits/network (a) a clock is driving a number of flip-flops. The top two rows of flip-flops are connected to a clock enable signal, clkEnable, whereas the bottom row of flip-flops is not connected to any clock enable signal. Observe that the clock is driven by global clock buffer (b) The new global clock buffer, called BUFGCE, The input to this buffer is also clk, however, the clock enable of this buffer is connected to flip-flop’s enable signal clkEnable, and the clkEnable signal is disconnected from the flip-flops it was previously feeding.

17 Clock Gating - Performance
Clock Power Reduction for Virtex-5 FPGAs Over 20% power reductions are observed for the DSP circuits Eliminates unnecessary toggling on outputs, gates of FFs and clock signals industry-a,b,c,d, are DSP circuits, while the remaining circuits are collected from customers and are of unknown function Demerits: Clock skew "Clock Power Reduction for Virtex-5 FPGAs",Wang, Gupta, Anderson

18 Software Techniques System Level: Algorithm Modification CAD Tools : Logic Partitioning Mapping, Clustering Placement & Routing A

19 Low Power FFT Implementation
Architecture Matrix multiplication ->1D array low power dissipation than 2D array Module Disabling – Clock gating to disable modules eg: twiddle factor calculation dynamic memory activation Multiple time multiplexed Pipeline uP Parallel Processing Algorithm : Block Matrix Multiplication Time-multiplexers instead of routing network are used for shuffling the intermediate data, thus reducing the burden of interconnection power for large FFT problem size. As pipeline stages increasesturn, reduces dynamic power. energy reduces - Pipelining reduces the number of spurious glitches which, in To reduce memory power, a method of dynamic memory activation is developed. Cache Based approach

20 FFT implementation Results
17% to 26% power reduction "High throughput energy efficient multi-FFTarchitecture on FPGAs" , Chen , Park, Prasanna

21 Energy Reduction Contributions of CAD Stages
Clustering contributes to the major share ! "On the interaction between power aware FPGA CAD algorithms" , Julien , Steven

22 Power Aware Clustering
Power Aware TV pack How?? Cost function Modification to include power

23 Results: Power Aware clustering
“Netlength Based Routability Driven Power Aware Clustering" , Akoglu, Easwaran

24 Power Aware Placement Problem Addressed:
Power analysis of configurable switches is usually implemented during the routing and mapping stages and has been largely ignored during the placement stage of the design due to the inaccuracy associated with power estimation at high level design process Proposed Idea: A Power-Aware Algorithm for the Design of Reconfigurable Hardware during High Level Placement Modeled the number of switches used in the circuit and employed simulated annealing algorithm to reduce the overall routing power

25 Results "On the interaction between power aware FPGA CAD algorithms" , Julien , Steven

26 Temperature Aware Routing
leakage current increases exponentially with temperature Switching capacitance Needs the knowledge of spatial distribution of parameters

27 Algorithm By discouraging routing algorithm to form connections that cross hotspot regions Cost Function Modification: Power Savings Range between 30 – 63 % "A Temperature-Aware Placement and Routing targeting 3D FPGAs", Kostas, Soudris

28 Power-Aware FPGA Design Flow
Step 1 Power Based Architectural (High level modelling) RTL Voltage scaling, Dual Vdd Freq Scaling, Clock gating Step 2 Power Aware Packing or Clustering CAD Power Aware Placement Tools Power Aware Routing

29 Main/Baseline Paper Problem Addressed Proposed idea
Power consumption in FPGAs is dominated by interconnect(62%) Proposed idea Charge recycling for power reduction in FPGA interconnect

30 Charge Recycling (CR)

31 Charge Recycling in FPGAs
How?? “Unused routing resources “ as reservoirs Reduces charge drawn from Vdd 25% reduction in energy Unused/Reservoir Unused/Reservoir Unused w/o friends !! 

32 CR-Capable FPGA Interconnect
Analysis Four components SRAM Cell Produce signals CR and TS : control a switch (Normal, CR, tri-state ) Delay Line Transition between VIN and DLOUT CR Circuit Perform the charge sharing between the load and reservoir Input Stage

33 Experiments/Methodology
VPR6.0 Baseline : Island style, Unidirectional, Wilton (K=6 ,N=4) Router – Path Finder - Cost Function Modification Post Routing CR mode VPR place/route tool helps in finding % increase in area

34 VPR Cost Function Cost Function – Path Finder Modified Cost Function

35 Post - Routing Mixed Integer Linear Program
Tries to maximize the number of nodes to be put into CR mode Constraint: Critical delay of the circuit

36 Results Dynamic power in the FPGA interconnect is reduced by up to ∼ %

37 Results Continued… Number of min-width transistors as the area metric
Reductions in power savings are not directly proportional to the reduction in CR-capable switches (area)

38 What we propose new? Not all unused wires become friends
Unused wires connected to constant voltage “URekha” --- Unused wires Tri-stated “further power savings!!” ~6% savings

39 Thank you


Download ppt "Deepa Soman, HyunSuk Nam, Rekha Srinivasaraghavan, Shashank Sivakumar"

Similar presentations


Ads by Google