Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.

Slides:



Advertisements
Similar presentations
Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Advertisements

BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN. Ayse Bakır,CMPE 511,Bogazici University2 Outline  Introduction  Motivation  The B-Cache Organization  Experimental.
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
Lecture 12 Reduce Miss Penalty and Hit Time
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Leakage Energy Management in Cache Hierarchies L. Li, I. Kadayif, Y-F. Tsai, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, and A. Sivasubramaniam Penn State.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Power Reduction Techniques For Microprocessor Systems
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Memory system.
CENTRAL PROCESSING UNIT
Elettronica T A.A Digital Integrated Circuits © Prentice Hall 2003 Inverter CMOS INVERTER.
Adaptive Techniques for Leakage Power Management in L2 Cache Peripheral Circuits Houman Homayoun Alex Veidenbaum and Jean-Luc Gaudiot Dept. of Computer.
Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
A highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Walid Najjar* *University of California, Riverside **The.
A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang, Vahid F., Lysecky R. Proceedings of Design, Automation and Test in Europe Conference.
Decomposition of Instruction Decoder for Low Power Design TingTing Hwang Department of Computer Science Tsing Hua University.
A Highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang, Frank Vahid and Walid Najjar University of California, Riverside ISCA 2003.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
Frank Vahid, UC Riverside 1 Recent Results at UCR with Configurable Cache and Hw/Sw Partitioning Frank Vahid Associate Professor Dept. of Computer Science.
1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.
1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin.
Chuanjun Zhang, UC Riverside 1 Using a Victim Buffer in an Application- Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
Computing Systems Memory Hierarchy.
Case Study - SRAM & Caches
Computing Hardware Starter.
A Novel Cache Architecture with Enhanced Performance and Security Zhenghong Wang and Ruby B. Lee.
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Lecture 19 Today’s topics Types of memory Memory hierarchy.
A Centralized Cache Miss Driven Technique to Improve Processor Power Dissipation Houman Homayoun, Avesta Makhzan, Jean-Luc Gaudiot, Alex Veidenbaum University.
1 Memory Hierarchy The main memory occupies a central position by being able to communicate directly with the CPU and with auxiliary memory devices through.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
A S ELF -T UNING C ACHE ARCHITECTURE FOR E MBEDDED S YSTEMS Chuanjun Zhang, Frank Vahid and Roman Lysecky Presented by: Wei Zang Mar. 29, 2010.
1 CENG 450 Computer Systems and Architecture Cache Review Amirali Baniasadi
Basics of Energy & Power Dissipation
Computer Organization & Programming
Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems.
© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.
1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.
Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)
ASPLOS’02 Presented by Kim, Sun-Hee.  Technology trends ◦ The rate of frequency scaling is slowing down  Performance must come from exploiting concurrency.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
Index What is an Interface Pins of 8085 used in Interfacing Memory – Microprocessor Interface I/O – Microprocessor Interface Basic RAM Cells Stack Memory.
Nios II Processor: Memory Organization and Access
Improving Memory Access 1/3 The Cache and Virtual Memory
The University of Adelaide, School of Computer Science
Cache Memory Presentation I
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
ECE 445 – Computer Organization
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
Chapter 1 Introduction.
Memory Organization.
2.C Memory GCSE Computing Langley Park School for Boys.
Computer Evolution and Performance
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Automatic Tuning of Two-Level Caches to Embedded Applications
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Presented by Rania Kilany

 Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache Memories consumes 50% of the total energy.  Desktop systems  Desktop systems runs a very wide range of applications and the cache architecture is set to work well with the given applications, technology and cost.  Embedded systems  Embedded systems are designed to run a small range of well- defined applications. So the cache architecture can have both increased performance as well as lower energy consumption.

Power dissipation in CMOS circuits: 1. Static power 1. Static power dissipation due to leakage current 2. Dynamic power 2. Dynamic power dissipation due to logic switching current and the charging and discharging of the load capacitance. 3. Energy consumption 3. Energy consumption :  Fetching instruction and data from off-chip memory because of the high off-chip capacitance and large off-chip memory storage.  The Microprocessor stalls while waiting for the instruction and/or data.

The total energy due to memory accesses is as follows: Energy_mem = Energy_dynamic + Energy_static ….(1) Energy_dynamic = cache_hits * energy_hit + cache_misses * energy_miss Energy_miss = energy_offchip_access + energy_uP_stall + energy_cache_block_fill Energy_static = Energy_static = cycles * energy_static_per_cycle

Cache size = 8 Kbytes Block size = 32 bytes 32-bit address four-way set-associative

A Direct Mapped Cache Less power per access Four-way set associative cache More power per access one tag and one data array are read Four tags and Four data arrays are read Low Miss Rate

A Direct Mapped Cache Four-way set associative cache one tag and one data array are read Four tags and Four data arrays are read High Miss Rate Higher energy due to longer time & High power for accessing the next level memory Lowers Cache Miss Rate Reduce Time and Power that would have been caused by misses

Tuning the associativity to a particular application is extremely important to minimize energy Motivating the need for a cache with configurable associativity

reconfigured way-concatenation.  The cache can be reconfigured by software to be direct- mapped, two-way, or four-way set associative, using a technique called way-concatenation.  reconfigured Cache  reconfigured Cache have very little size and performance overhead.  Way-concatenation  Way-concatenation reduces energy caused by dynamic power.  Way-shutdown  Way-shutdown reduces energy caused by static power if combined with way-concatenation.

 Develop a cache architecture whose associativity could be configured as one, two or four ways, while still utilizing the full capacity of the cache.  6 index four-way  6 index bits for a four-way cache  7 index two-way  7 index bits for a two-way cache  8 index one-way  8 index bits for a one-way cache. 8 or more  Could be extended for 8 or more ways.

How much the configurability of such a cache increases access time compared to a conventional four-way cache??  the configuration circuit is not on the cache access critical path, the circuit executes concurrently with the index decoding. speed  Set the sizes of the transistors in the configure circuit such that the speed of the configure circuit is faster than that of the decoder. area  The configure circuit area is negligible.

 Change two inverters on the critical path into NAND gates: one inverter after the decoder, and one after the comparator. delay  Increasing the sizes of the NAND gates to three times their original size to reduce the critical path delay to the original time. area  Replacing the inverters by NAND gates with larger transistors resulted in a less than 1% area increase of the entire cache.

 The First observation: 37% 60%  The First observation: A way-concatenation cache results in an average energy savings of 37% compared to a conventional four-way cache, with savings over 60% for several examples. 284%  Compared to a conventional direct mapped cache, the average savings are more modest, but the direct mapped cache suffers large penalties for some examples – up to 284%  The second observation: way-concatenation way-shutdown %  The second observation: way-concatenation is better than way-shutdown for reducing dynamic power. It saves more energy sometimes saving %

way-shutdown  Although way-shutdown increases the miss rate for some benchmarks, for other benchmarks, way shutdown has negligible impact. gated-Vdd.  To save static energy, involving a circuit level technique called gated-Vdd. gated-Vdd  When the gated-Vdd transistor is turned off, the stacking effect of the extra transistor reduces the leakage energy dissipation.

 To save dynamic power:  To save dynamic power: A configurable cache design method called way-concatenation was developed. It saves 37% compared to a conventional four-way set-associative cache  To save static power:  To save static power: Extended the configurable cache to include a way-shutdown method, with average savings of 40%.

Thank You