SECTIONS 1-7 By Astha Chawla

Slides:



Advertisements
Similar presentations
Chapter 4 Memory Management Basic memory management Swapping
Advertisements

DSPs Vs General Purpose Microprocessors
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
CS 7810 Lecture 4 Overview of Steering Algorithms, based on Dynamic Code Partitioning for Clustered Architectures R. Canal, J-M. Parcerisa, A. Gonzalez.
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Virtual Memory Introduction to Operating Systems: Module 9.
Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
CS 153 Design of Operating Systems Spring 2015
Low power Design Strategies Daniele Folegnani. Talk outline Why Low Power is Important Power Consumption in CMOS Circuits New Trends for Future Microprocessors.
Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department.
UPC Reducing Power Consumption of the Issue Logic Daniele Folegnani and Antonio González Universitat Politècnica de Catalunya.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 Low Power Design in Microarchitectures and Memories [Adapted from Mary Jane Irwin (
Computer Organization and Architecture
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Low Power Design of Integrated Systems Assoc. Prof. Dimitrios Soudris
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 TKT-1527 Digital System Design Issues Low Power Techniques in Microarchitectures and Memories Mary Jane.
Low Power Techniques in Processor Design
GREEN COMPUTING Power Consumption Basics in ICT Products
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Power Management in High Performance Processors through Dynamic Resource Adaptation and Multiple Sleep Mode Assignments Houman Homayoun National Science.
A Centralized Cache Miss Driven Technique to Improve Processor Power Dissipation Houman Homayoun, Avesta Makhzan, Jean-Luc Gaudiot, Alex Veidenbaum University.
Energy-Effective Issue Logic Hasan Hüseyin Yılmaz.
Houman Homayoun, Sudeep Pasricha, Mohammad Makhzan, Alex Veidenbaum Center for Embedded Computer Systems, University of California, Irvine,
경종민 Low-Power Design for Embedded Processor.
Processor Architecture
Memory Management. Why memory management? n Processes need to be loaded in memory to execute n Multiprogramming n The task of subdividing the user area.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Varun Mathur Mingwei Liu Sanghyun Park, Aviral Shrivastava and Yunheung Paek.
Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.
PipeliningPipelining Computer Architecture (Fall 2006)
Graduate Seminar Using Lazy Instruction Prediction to Reduce Processor Wakeup Power Dissipation Houman Homayoun April 2005.
Dynamic Associative Caches:
Chapter 2 Memory and process management
Memory Segmentation to Exploit Sleep Mode Operation
Memory COMPUTER ARCHITECTURE
Understanding Operating Systems Seventh Edition
Low-power Digital Signal Processing for Mobile Phone chipsets
A Closer Look at Instruction Set Architectures
Lynn Choi Dept. Of Computer and Electronics Engineering
Chapter 9 – Real Memory Organization and Management
Architecture & Organization 1
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Cache Memory Presentation I
/ Computer Architecture and Design
Improving Program Efficiency by Packing Instructions Into Registers
Microarchitectural Techniques for Power Gating of Execution Units
Hyperthreading Technology
Chapter 8: Main Memory.
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
Memory Management 11/17/2018 A. Berrached:CS4315:UHD.
CMSC 611: Advanced Computer Architecture
Architecture & Organization 1
Computer Architecture Lecture 4 17th May, 2006
Power-Aware Microprocessors
Computer Architecture
Memory Management: Early System.
Overheads for Computers as Components 2nd ed.
Adaptive Single-Chip Multiprocessing
Instruction Level Parallelism (ILP)
Chapter 12 Pipelining and RISC
How to improve (decrease) CPI
COMP755 Advanced Operating Systems
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
Presentation transcript:

SECTIONS 1-7 By Astha Chawla CHAPTER 4 Optimizing Capacitance and Switching Activity to Reduce Dynamic Power SECTIONS 1-7 By Astha Chawla

Introduction C and A are intertwined P = V2 X f x Ceffective. ILP + Frequency increase => Power problem!! Factors affecting A: Complexity of the processor Exploitation of parallelism Bit-width of its structures etc. Optimized at the architectural and microarchitectural level Can be changed by run-time optimizations Factors affecting C: Size of a processor’s structure Organization to exploit locality Manipulated at the circuit and process technology level Determined at fixed design time

Excess Switching Activity. Idle-Unit switching activity: Triggered by clock transitions in unused portions of hardware. Idle –width switching activity : Mismatch in the implemented and the actual width of processor structures. Idle-capacity switching activity : When a program does not use the provided hardware architectures in their entirety. Parallel switching activity: Activity expended in parallel for performance Cacheable switching activity: Repetitive switching activity, convert computing activity to cache lookups Speculative switching activity: Speculatively executing incorrect instructions is wasted activity Value- dependent switching activity: Power consumed depends on the actual data values.

Capacitance Does not change dynamically Total capacitance = Capacitance of transistors + capacitance of wires. Burd and Brodersen: CL = CW + Cfixed Low power architectural techniques require partitioning: Wire partitioning Bit-line segmentation

IDLE- UNIT SWITCHING ACTIVITY. Static logic: To eliminate switching, enough to prevent inputs from changing. Dynamic logic: Power can be consumed even if the inputs to the circuit do not change No effect on computation Clock gating

Guarded evaluation: aims to shuts down part of the original circuit. Precomputation: aims to derive a precomputation circuit for a logic block multiplexed precomputation architecture. F(x=0), F(x=1) Guarded evaluation: aims to shuts down part of the original circuit.

Deterministic clock gating Gating the clock to the processor structures when they are known to be idle Power savings, improves EDP, without performance loss. Clock gating examples: IBM’s Power 5 Reduction in switching power > 25% Implements fine-grain gating domain Intel’s Xscale processors Implements three power- saving modes: Idle, Standby, Sleep Cuts down power consumption by 30%

Idle- Width switching activity: Core Arises from a mismatch between the designed bit-width of a processor and the actual bit-width needed in frequently occurring operations Dynamically detects narrow- width (16 bit wide or less) operands. Abundance in integer and multimedia applications Approaches: Value gating: disabling the unused width. Disabling switching in unused parts of ALU if both operands are narrow. Significant power savings Operation packing: Packs more than one narrow- width operation in the full width of hardware Improves performance without significant power overhead. Speculative operation packing. Significance compression: Compresses non-significant bits. Byte serial pipeline.

Idle- Width switching activity: Caches Dynamic zero compression: accesses only significant bits Only compresses zero bytes.- zero indicator bit Frequent value compression: dictionary loaded with the frequent values of a program. Simple Most efficient compression mechanism Frequent value cache: cache line contains compressed and uncompressed words. First array: holds 8 low-order bits. Second array: holds remaining 24 high-order bits

Packing compressed cache lines Space freed by compression remains empty. Increases cache utilization: indirect power savings. Packing techniques: Variable packing: packs variable number of cache lines into cache frames. expensive Fixed packing: preset number of cache lines are packed Reduced opportunities for compression Compression cache: Uses frequent value compression Does not attempt to pack cache lines into frames Frame holds either two compressed or one uncompressed line. Significance compression cache: lines are compressed using sign compression Instruction compression.

IDLE- CAPACITY SWITCHING ACTIVITY Wasted activity related to out-of-order execution Processor resources over provisioned to support high instruction throughput. Power inefficiency of out-of-order processors: Energy-per-instruction growth Ei ~ (IW)γ .

Resource partitioning. Cannot afford latency of very long wires. Partitioned by placing buffers Aimed at size vs speed trade-off. Wire partitioning Wire delay proportional to R x C . Breaking wire into ‘k’ segments improves delay by k2 Total energy increases exponentially with k. Replacing buffers with tristate devices.

IDLE- CAPACITY SWITCHING ACTIVITY: INSTRUCTION QUEUE. Resizable IQ, mix of CAM and SRAM Readiness feedback control Adjust IQ size based on the activity of its entries. Decision making scheme has a safety mechanism. Occupancy feedback control IQ, LSQ, ROB. Occupancy of a structure is the appropriate feedback control metric. Logical resizing without partitioning IQ organized as a circular FIFO buffer. Limiting the size logically by limiting the part that can be allocated to new entries ILP- contribution feedback control Instruction queue collapsing

IDLE-CAPACITY SWITCHING ACTIVITY: CORE Dynamically changing the width of an 8-issue processor to 6 or 4-issue. 6-issue processor: half of a cluster is disabled 4-issue processor: one whole cluster is disabled Appropriate functional units are clock gated. Decisions made at the end of the sampling window

THANK YOU!