Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs.

Slides:



Advertisements
Similar presentations
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Advertisements

Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Power Reduction Techniques For Microprocessor Systems
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Introduction to CMOS VLSI Design Lecture 18: Design for Low Power David Harris Harvey Mudd College Spring 2004.
Lecture 2: Field Programmable Gate Arrays I September 5, 2013 ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I.
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Power-Aware Placement
Lecture 3: Field Programmable Gate Arrays II September 10, 2013 ECE 636 Reconfigurable Computing Lecture 3 Field Programmable Gate Arrays II.
Low-Power CMOS SRAM By: Tony Lugo Nhan Tran Adviser: Dr. David Parent.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Lecture 5 – Power Prof. Luke Theogarajan
Lecture 7: Power.
Power, Energy and Delay Static CMOS is an attractive design style because of its good noise margins, ideal voltage transfer characteristics, full logic.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Case Study - SRAM & Caches
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Circuit design for FPGAs: –Logic elements. –Interconnect.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
Power Reduction for FPGA using Multiple Vdd/Vth
Deepa Soman, HyunSuk Nam, Rekha Srinivasaraghavan, Shashank Sivakumar
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Static Timing Analysis and Gate Sizing.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Dept. of Computer Science, UC Irvine
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Lecture 18: Dynamic Reconfiguration II November 12, 2004 ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II.
Washington State University
Digital Logic Design Instructor: Kasım Sinan YILDIRIM
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2010 Memory Overview.
Lecture 20: Exam 2 Review November 21, 2013 ECE 636 Reconfigurable Computing Lecture 20 Exam 2 Review.
Power-Aware RAM Processing for FPGAs December 9, 2005 Power-aware RAM Processing for FPGA Embedded Memory Blocks Russell Tessier University of Massachusetts.
McKenneman, Inc. SRAM Proposal Design Team: Jay Hoffman Tory Kennedy Sholanda McCullough.
Basic Sequential Components CT101 – Computing Systems Organization.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
Leakage reduction techniques Three major leakage current components 1. Gate leakage ; ~ Vdd 4 2. Subthreshold ; ~ Vdd 3 3. P/N junction.
경종민 Low-Power Design for Embedded Processor.
Basics of Energy & Power Dissipation
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 7, 2014 Memory Overview.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 8, 2013 Memory Overview.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Low Power SRAM VLSI Final Presentation Stephen Durant Ryan Kruba Matt Restivo Voravit Vorapitat.
LOW POWER DESIGN METHODS
-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.
Memories.
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Give qualifications of instructors: DAP
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
SECTIONS 1-7 By Astha Chawla
Day 26: November 11, 2011 Memory Overview
The Xilinx Virtex Series FPGA
332:479 Concepts in VLSI Design Lecture 24 Power Estimation
Topics Circuit design for FPGAs: Logic elements. Interconnect.
A High Performance SoC: PkunityTM
FPGA Glitch Power Analysis and Reduction
Lecture 7: Power.
The Xilinx Virtex Series FPGA
Off-path Leakage Power Aware Routing for SRAM-based FPGAs
Lecture 7: Power.
Presentation transcript:

Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013 Overview FPGAs generally considered power hungry compared to ASIC and processor counterparts -Mostly due to unused interconnect Recent area of extensive research Device techniques -Voltage scaling -Sleep mode Software techniques -Reduced switching -Reduced capacitance

Lecture 16: Power Reduction Techniques November 5, 2013 Dynamic Power °Dynamic power is required to charge and discharge load capacitances when transistors switch. °One cycle involves a rising and falling output. °On rising output, charge Q = CV DD is required °On falling output, charge is dumped to GND Courtesy: Harris Short circuit current Charge/discharge current

Lecture 16: Power Reduction Techniques November 5, 2013 Dynamic Power Short circuit power <10% of dynamic power

Lecture 16: Power Reduction Techniques November 5, 2013 °Junction leakage °Gate oxide leakage °Subthreshold leakage FPGA Static Power Consumption

Lecture 16: Power Reduction Techniques November 5, 2013 °Junction leakage Small fraction of leakage °Gate oxide leakage When Vgs < Vt still some source-drain current Increases exponentially as Vt decreases Decreases exponentially as Vgs decreases °Subthreshold leakage Increases exponentially as Vgs increases FPGA Static Power Consumption Courtesy: Nowak Technology trend

Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Power Reduction Goals Dynamic power goals -Reduce Vdd along non-critical paths -Low swing signalling -Use CAD approaches to limit long high-toggle paths -P dynamic = 0.5 * C * Vdd 2 * f Static power goals -Cut-off Vdd for unused transistors -Use high Vt transistors for SRAM cells -Various other voltage biasing techniques

Lecture 16: Power Reduction Techniques November 5, 2013 Traditional Routing Switch level-restoring buffer Courtesy: Anderson

Lecture 16: Power Reduction Techniques November 5, 2013 Proposed Switch Designs: Anderson °Based on 3 observations: Routing switch inputs tolerant to weak-1 signals (level-restoring buffers). Considerable slack in FPGA designs  many switches can be slowed down. Most routing switches feed other routing switches. -Can produce weak-1 logic signals.

Lecture 16: Power Reduction Techniques November 5, 2013 “Basic” Switch Design high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION: V VD

Lecture 16: Power Reduction Techniques November 5, 2013 High-Speed Mode high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION: output swing: rail-to-rail. V VD = V DD

Lecture 16: Power Reduction Techniques November 5, 2013 Low-Power Mode high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION: output swing: GND-to- (V DD -V TH ). V VD = V DD - V TH V VD output swing: GND-to- (V DD -V TH ).

Lecture 16: Power Reduction Techniques November 5, 2013 Sleep Mode high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION: V VD

Lecture 16: Power Reduction Techniques November 5, 2013 Leakage Power Results: Anderson LP modeSleep modeLP mode (+unused fanout) LP mode (+used fanout) Traditional switch % leakage power reduction vs. high-speed mode Basic

Lecture 16: Power Reduction Techniques November 5, 2013 Region Constrained Placement Rather than just focusing on routing, consider constraining logic Most circuits exhibit locality Gayasen: FPGA’2004

Lecture 16: Power Reduction Techniques November 5, 2013 Region Constrained Placement Several issues to consider Size of sleep transistor -Too large: increases leakage, area -Too small: affects logic performance Size of region -Too large: possibly unused resources, complicates placement -Too small: Sleep transistors take up too much room

Lecture 16: Power Reduction Techniques November 5, 2013 Experimental Flow: RCP Different region sizes considered for flow Area constraints for portions of design determined by hand May encourage designers to create granular designs

Lecture 16: Power Reduction Techniques November 5, 2013 Power Savings: RCP Note significant reduction in leakage power savings as region size increases Bottom curve primarily due to luck

Lecture 16: Power Reduction Techniques November 5, 2013 Performance Limitation: RCP Performance limited by use of regions Nearly 10% clock frequency reduction for many designs

Lecture 16: Power Reduction Techniques November 5, 2013 Low-swing Signalling Techniques we have examined so far look at tinkering with supply voltage Also possible to modify wire signalling to reduce voltage swing Most of FPGA is made up of interconnect Approach targets dynamic power consumption George and Rabaey: 1997

Lecture 16: Power Reduction Techniques November 5, 2013 Low-swing Signalling Interconnect swing is at 0.8V while rest of circuit operates at 1.5V Cascode circuitry used at sink to overcome slow speed issues 50% energy savings at cost of 25% delay

Lecture 16: Power Reduction Techniques November 5, 2013 Alternate approach: Modifying FPGA CAD FPGA architecture modification impact all designs- even those that don’t care about power Can placement and routing be modified to consider dynamic power -Need to know which signals are high toggle -Attempt to minimize length of high-toggle wires -Minimize impact on performance and area Techniques fit well into our previous work on placement and routing Lamoreaux and Wilton

Lecture 16: Power Reduction Techniques November 5, 2013 Modifying FPGA CAD Placement Previous cost metrics for annealing considered bounding box wire length and timing costs Include additional term which considers signal switching activity

Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Placement for Power Previous cost metrics for annealing considered bounding box wire length and timing costs Include additional term which considers signal switching activity Post-route energy reduced by 3.0%. Power decreased by 7% but delay increases by 4%

Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Routing Modifications for Power Original routing cost function takes congestion b(n) and delay(n) into account Augment with factor that takes net activity into account Minimize length of most active nets, even in the presence of congestion.

Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Routing for Power Results Potential benefits somewhat limited by placement Note that most nets have low activity Power is decreased by 6% but delay increased by 4%. Energy savings of about 3%

Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Embedded Memory Blocks °Embedded memory blocks (EMBs) are important parts of FPGAs °Consume roughly 14% of Altera Stratix II dynamic power * Increasing in recent designs * Stratix II Low Power Applications Note, 2005

Lecture 16: Power Reduction Techniques November 5, 2013 Embedded Memory Block Port Internal View Write Data MClk Write Enable Column Mux Write Buffers Sense Amps Row Decode Read Data Read Enable Latch Address MClk Clk Enable Clk RAM cell BIT Bit Line Pre-charge MClk Reducing clocking saves dynamic power

Lecture 16: Power Reduction Techniques November 5, 2013 Power Optimization #1 °Convert EMB read enable/write enable signals to associated read/write clock enable signals °Limitations Each port has read or write enable control signal Embedded memory block has read enable input Clock Wren Data Write Address Read Address Q Write enable Read enable Q Rden Vcc Wr clk enable Rd clk enable Write Address Read Address Clock Wren Data Write Address Read Address Q Write enable Read enable Q Rden Vcc Wr clk enable Rd clk enable Write Address Read Address BeforeAfter

Lecture 16: Power Reduction Techniques November 5, 2013 Implementation °Conversion mode Ties off R/W enable to RAM clock enables Doesn’t make transform if CE already present on port °Combining mode AND user RAM clock enables with derived R/W clock Could impact performance Combined Write Clk Enable Write Enable User-defined Write Clk Enable

Lecture 16: Power Reduction Techniques November 5, 2013 FPGA RAM Processing °FIFOs and Shift registers converted into logical RAMs °Logical RAMs mapped to RAM blocks FIFO, Shift Register, RAM specification Create Logical Memory Logical RAMs/ logic Logical-to- physical RAM processing RAM blocks/ logic Memory/ logic placement Placed Memory

Lecture 16: Power Reduction Techniques November 5, 2013 Mapping RAM to EMBs °Implementation choice can impact design area, performance, and power. °Some mappings may require multiple EMBs 4k deep x 4 wide 16K bits 4K bits M4K User-defined (logical) memory Physical (EMB) memory 512K MRAM

Lecture 16: Power Reduction Techniques November 5, 2013 Memory Organization °Each EMB can be configured to have different depth and width (e.g. Stratix II M4K) °All hold 4K bits °Slightly lower power consumption for wider EMB configurations (not including routing) 4K words deep 1 bit wide 32 bits wide 128 words deep 8 bits wide 512 words deep

Lecture 16: Power Reduction Techniques November 5, 2013 Area and Delay Optimal Mapping °Configure each EMB to be as deep as possible °Number of address bits on each EMB same as on logical memory °Area and performance efficient: no external logic needed °Power inefficient: All EMBs must be active during each logical RAM access 4k words deep and 1 bit wide (4 times) Addr[0:11] Data[0:3] 4k words deep and 4 bits wide Logical memory 4 EMBs active during access EMB Vertical Slicing

Lecture 16: Power Reduction Techniques November 5, 2013 Alternative Mapping °Configure EMB to have width of logical RAM (e.g. 1Kx4) Allows shutdown of some RAMs each cycle But adds some logic °Saves RAM power, adds combinational logic and register power More Power Efficient: 1K deep x 4 wide (4 times) 1 EMB active during access Addr Decoder 4 Addr[0:9] Addr[10:11] Data[0:3] 4k words deep and 4 bits wide Logical memory Addr[10:11] Horizontal Slicing

Lecture 16: Power Reduction Techniques November 5, 2013 RAM Slicing - Example °Power reduction available with different slicing 4kx32 Dynamic Power Maximum Depth Dynamic Power (mW) Best range Multiplexer Power Increasing k2k4k EMB Power Increasing

Lecture 16: Power Reduction Techniques November 5, 2013 Power Optimization #2: Power-aware RAM Partitioning °Algorithm considers possible logical to physical RAM mappings Completed placement Insert Decode and Mux Logic FIFO, Shift Register Create Logical Memory Power-aware Physical RAM processing Memory/ Logic Placement Power Library

Lecture 16: Power Reduction Techniques November 5, 2013 Experimental Approach °40 designs evaluated °Quartus 5.1 °Mapped to smallest possible device and target max frequency °Simulation with test vectors °Power analysis with PowerPlay

Lecture 16: Power Reduction Techniques November 5, 2013 Memory Power °21.0% average reduction for all techniques (9.7% with convert/combine)

Lecture 16: Power Reduction Techniques November 5, 2013 Overall Core Dynamic Power °6.8% average power reduction for all techniques (2.6% with convert/combine) Designs % Dyn. Power Reduction Enable convert/ combine Enable convert/ combine + mem partition

Lecture 16: Power Reduction Techniques November 5, 2013 Design Performance °1.0% average performance loss for all techniques (0.1% for enable convert/combine) Average Design Clock Frequency Designs % Frequency Improvement Enable Convert/ Combine Enable Convert/ Combine + Mem Partition

Lecture 16: Power Reduction Techniques November 5, 2013 Results Summary °Almost 7% core dynamic power reduction across all designs Some designs benefit more than others °Minimal clock frequency hit for most designs Enable convert Enable convert/ combine Enable convert/ combine + Mem partition Core dynamic power -1.8%-2.6%-6.8% Memory dynamic power -6.3%-9.7%-21.0% Max clk freq -0.1%-0.2%-1.0% LUT count 0.0%0.1%0.7%

Lecture 16: Power Reduction Techniques November 5, 2013 Impact of Multiple Embedded Memory Blocks °Rerun 40 designs but only allow one type of target EMB for each mapping °All designs targeted to Stratix II EP2S180 °Significant power impact for most designs versus EP2S180 target with no restrictions M512M4KM-RAM Designs completed23384 Core dynamic power40.4%6.6%47.3% Memory power279.5%33.3%754.0% Max clk freq.-2.2%0.6%-1.0% LUT count0.4%-0.5%0.0%

Lecture 16: Power Reduction Techniques November 5, 2013 Summary °Key to reducing RAM power is keeping clocks disabled. °Movement of read/write enables to clock enables limits dynamic activity °Power-aware RAM partitioner attempts to select power-optimal mapping – combined with clock enable enhancement °Overall About 21% average memory power reduction -10% enable convert/combine About 7% average dynamic power reduction -3% enable convert/combine Diversity of EMBs reduces power by 33%

Lecture 16: Power Reduction Techniques November 5, 2013 Summary FPGA power consumption under consideration at numerous level: architecture, circuit, CAD, and physical FPGA companies just now embracing power-aware CAD, power-aware architectures on the way Many circuit-level techniques still possible RTL CAD synthesis techniques provide a promising area for exploration