ECE Department, University of California, Davis

Slides:



Advertisements
Similar presentations
1 EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling.
Advertisements

Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
High-Performance Microprocessor Design. Outline Introduction Technology scaling Power Clock Verification.
Power Reduction Techniques For Microprocessor Systems
Houshmand Shirani-mehr 1,2, Tinoosh Mohsenin 3, Bevan Baas 1 1 VCL Computation Lab, ECE Department, UC Davis 2 Intel Corporation, Folsom, CA 3 University.
Trends and Perspectives in deep-submicron IC design Bram Nauta MESA + Research Institute University of Twente, Enschede, The Netherlands University of.
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Lecture 7: Power.
Integrated Regulation for Energy- Efficient Digital Circuits Elad Alon 1 and Mark Horowitz 2 1 UC Berkeley 2 Stanford University.
Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.
Power Reduction for FPGA using Multiple Vdd/Vth
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
An Efficient Algorithm for Dual-Voltage Design Without Need for Level-Conversion SSST 2012 Mridula Allani Intel Corporation, Austin, TX (Formerly.
Sub-threshold Design of Ultra Low Power CMOS Circuits Students: Dmitry Vaysman Alexander Gertsman Supervisors: Prof. Natan Kopeika Prof. Orly Yadid-Pecht.
MS108 Computer System I Lecture 2 Metrics Prof. Xiaoyao Liang 2014/2/28 1.
Using Cycle Efficiency as a System Designer Metric to Characterize an Embedded DSP and Compare Hard Core vs. Soft Core Advisor Dr. Vishwani D. Agrawal.
A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling Dean Truong, Wayne Cheng, Tinoosh.
Guy Lemieux, Mehdi Alimadadi, Samad Sheikhaei, Shahriar Mirabbasi University of British Columbia, Canada Patrick Palmer University of Cambridge, UK SoC.
Performance and Power Analysis of Globally Asynchronous Locally Synchronous Multiprocessor Systems Zhiyi Yu, Bevan M. Baas VLSI Computation Lab, ECE department,
A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu,
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Multi-Split-Row Threshold Decoding Implementations for LDPC Codes
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
1 Dual-V cc SRAM Class presentation for Advanced VLSIPresenter:A.Sammak Adopted from: M. Khellah,A 4.2GHz 0.3mm 2 256kb Dual-V CC SRAM Building Block in.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
FaridehShiran Department of Electronics Carleton University, Ottawa, ON, Canada SmartReflex Power and Performance Management Technologies.
CS203 – Advanced Computer Architecture
Tinoosh Mohsenin 2, Houshmand Shirani-mehr 1, Bevan Baas 1 1 University of California, Davis 2 University of Maryland Baltimore County Low Power LDPC Decoder.
Rapid Power MOSFET Switching US Patent # ComLSI, Inc.
Copyright © 2009, Intel Corporation. All rights reserved. Power Gate Design Optimization and Analysis with Silicon Correlation Results Yong Lee-Kee, Intel.
COMP541 Transistors and all that… a brief overview
Power-Optimal Pipelining in Deep Submicron Technology
Smruti R. Sarangi IIT Delhi
CS203 – Advanced Computer Architecture
Dynamic Frequency Scaling using on-chip Thermal Sensors in ASAP7 7nm Predictive PDK Vaibhav Verma Mandi Das Wole Jaiyeoba.
Temperature and Power Management
COMP541 Transistors and all that… a brief overview
Dynamic Frequency Scaling using on-chip Thermal Sensors in ASAP7 7nm Predictive PDK Vaibhav Verma Mandi Das Wole Jaiyeoba.
The Inverter EE4271 VLSI Design Professor Shiyan Hu Office: EERC 518
Inc. 32 nm fabrication process and Intel SpeedStep.
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
SECTIONS 1-7 By Astha Chawla
Basics of Energy & Power Dissipation
Estimate power saving by clock slowdown for s5378 in 180nm and 32nm CMOS Chao Han ELEC 6270.
October 10, 2017 Kjell Jeppson, Lena Peterson Chapter 5 Power in W & H
Ali Fard M¨alardalen University, Dept
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Ultra-Low-Voltage UWB Baseband Processor
Chapter #13: CMOS Digital Logic Circuits
An Improved Split-Row Threshold Decoding Algorithm for LDPC Codes
Lecture No. 7 Logic Gates Asalam O Aleikum students. I am Waseem Ikram. This is the seventh lecture in a series of 45 lectures on Digital Logic Design.
332:479 Concepts in VLSI Design Lecture 24 Power Estimation
CSV881: Low-Power Design Multicore Design for Low Power
High Throughput LDPC Decoders Using a Multiple Split-Row Method
Circuit Design Techniques for Low Power DSPs
The Inverter EE4271 VLSI Design Dr. Shiyan Hu Office: EERC 731
Energy Efficient Power Distribution on Many-Core SoC
Lecture 7: Power.
Power and Heat Power Power dissipation in CMOS logic arises from the following sources: Dynamic power due to switching current from charging and discharging.
Implementing Low-Power CRC-Half for RFID Circuits
The University of Adelaide, School of Computer Science
Lecture 7: Power.
Technology scaling Currently, technology scaling has a threefold objective: Reduce the gate delay by 30% (43% increase in frequency) Double the transistor.
Low Power Digital Design
COMP541 Transistors and all that… a brief overview
Lecture 1: Logic Gates & Analog Behavior of Digital Systems
Presentation transcript:

Dynamic Voltage and Frequency Scaling Circuits with Two Supply Voltages ECE Department, University of California, Davis Wayne H. Cheng and Bevan M. Baas

Outline Background and Motivation Implementation Results

DVFS Background Pdyn = αCVdd2f E = CVdd2 td ≈ CVdd/(Vdd−Vt)α Vdd ↓ f ↓ => Pdyn,leak ↓ Vdd ↓ => E ↓ Vdd ↓ => td ↑ fmax ↓ Reducing supply voltage: Reduces power and energy dissipation Reduces maximum clock frequency due to increased gate delay

Other DVFS Schemes Scheme1: Off-chip DC-DC Converter Scheme 2: On-chip DC-DC Converter

Presented DVFS Scheme Fine grain voltage scaling Maximum power/energy reduction with minimum performance overhead Small area overhead by using an off-chip DC-DC converter, and switching between voltages on-chip

Outline Background and Motivation Implementation Results

Implementation Schematic Voltage scaling with 2 discrete voltage levels Supply switching with PMOS power gates Automated DVFS based on workload Wrapper powered by always on power supply

Power Gate Sizing VPG = IPGRPG RPG ~ L/W W/L ↑ VPG ↓ td ↓ fmax ↑ td ≈ CVdd/(Vdd −VPG −Vt)α W/L ↑ VPG ↓ td ↓ fmax ↑ Voltage drop can be reduced by making W/L as large as possible → done by adding parallel power gates

Supply Switching Scheme Supply grid noise: Gradually switch between power supplies Shorting between power supplies: Shut both power supplies off and wait for some time before switching Data corruption: Stall the processor core before switching between power supplies

Dynamic Run-time Supply Switching Circuit Wait for request Stall core Shut off power Delay Supply switch Release stall

Physical Implementation Power gates are positioned along vertical power stripes Core power is supplied with horizontal power stripes

Outline Background and Motivation Implementation Results

Implementation Results Implemented in 65nm CMOS technology DVFS circuit area is 12% of AsAP processor’s core area 66% of the DVFS circuit area is power gates and decoupling capacitors Maximum power consumption of DVFS logic is 4% of AsAP processor core’s power

Energy Consumption Metric Pdyn = αCVdd2f E = CVdd2 Energy reduction is possible only with voltage scaling EDP = E * td Energy delay product measures the effect of increased delay with DVFS

Measurement of Relative Energy Delay Product β is the fraction of time operating on the lower voltage tdvfs is the total run time with DVFS torig is the total run time without DVFS.

9 Processor JPEG Application Vddhigh = 1.3V, Vddlow = 0.8V Maximum Frequency = 1.05 GHz Lower minimum frequency → Increase in time on lower supply EDP decreases as the minimum frequency decreases up until 13 MHz EDP increases as the performance overhead outweighs the energy savings

Various Applications Vddhigh = 1.3V, Vddlow = 0.8V Maximum Frequency = 1.05 GHz EDP dependent on workload variations Increase in workload variation → Increase in switching between supplies → Increase in performance overhead → Increase in EDP

Summary DVFS with 2 supply voltages Power gates sized to reduce perf. loss Robust supply switching circuit EDP is reduced by 48% on a 9 processor JPEG application Functional in silicon at 65nm node

Acknowledgements Funding Special thanks Intel Corporation UC Micro NSF Grant No. 0430090 CAREER award 0546907 SRC GRC Grant 1598, IntellaSys Corporation S Machines Uniquify Inc. Special thanks Members of the VCL, R. Krishnamurthy M. Anders, and S. Mathew ST Microelectronics and Artisan