Increasing Effective Cache Capacity Through the Use of Critical Words

Slides:

Advertisements

Similar presentations

COMPUTER SYSTEMS An Integrated Approach to Architecture and Operating Systems Chapter 9 Memory Hierarchy ©Copyright 2008 Umakishore Ramachandran and William.

Advertisements

Cosc 3P92 Week 9 Lecture slides

55:035 Computer Architecture and Organization Lecture 7 155:035 Computer Architecture and Organization.

Lecture 12 Reduce Miss Penalty and Hit Time

1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Caching I Andreas Klappenecker CPSC321 Computer Architecture.

1  2004 Morgan Kaufmann Publishers Chapter Seven.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

Computer Orgnization Rabie A. Ramadan Lecture 7. Wired Control Unit What are the states of the following design:

CS1104: Computer Organisation School of Computing National University of Singapore.

COEN/ELEN 180 Storage Systems Memory Hierarchy. We are therefore forced to recognize the possibility of constructing a hierarchy of memories, each of.

EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

What is cache memory?. Cache Cache is faster type of memory than is found in main memory. In other words, it takes less time to access something in cache.

Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)

King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

COEN 180 Memory Hierarchy. We are therefore forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity.

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

Lecture#15. Cache Function The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

1  1998 Morgan Kaufmann Publishers Chapter Seven.

1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:

The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

What is it and why do we need it? Chris Ward CS147 10/16/2008.

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

Computer Organization CS224 Fall 2012 Lessons 37 & 38.

Cache Advanced Higher.

CMSC 611: Advanced Computer Architecture

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

Memory COMPUTER ARCHITECTURE

Reducing Hit Time Small and simple caches Way prediction Trace caches

The Goal: illusion of large, fast, cheap memory

Improving Memory Access 1/3 The Cache and Virtual Memory

CSC 4250 Computer Architectures

How will execution time grow with SIZE?

Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.

Cache Memory Presentation I

Morgan Kaufmann Publishers Memory & Cache

Morgan Kaufmann Publishers

CSCI206 - Computer Organization & Programming

Lecture 21: Memory Hierarchy

Chapter 8 Digital Design and Computer Architecture: ARM® Edition

Lecture 22: Cache Hierarchies, Memory

Lecture 29: Virtual Memory-Address Translation

10/16: Lecture Topics Memory problem Memory Solution: Caches Locality

How can we find data in the cache?

Miss Rate versus Block Size

Lecture 20: OOO, Memory Hierarchy

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007

Memory Operation and Performance

Morgan Kaufmann Publishers Memory Hierarchy: Introduction

Lecture 22: Cache Hierarchies, Memory

Chapter Five Large and Fast: Exploiting Memory Hierarchy

A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,

Memory Principles.

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

Increasing Effective Cache Capacity Through the Use of Critical Words Mentor: Amirhossein Mirhosseininiri Mentees: Ben Schreiber, Pooja Welling Motivation Results Caches are an integral part of modern computing. They act as a buffer between the CPU and the main memory, storing recently used data for quick access. Caching lowers the amount of main memory accesses. The prevention of these costly accesses saves time and energy. This project seeks to increase the effective capacity of the cache. The goal of this research is to develop a scheme where only the first needed part of a block of memory is stored, with the rest of the block being fetched while the first part is processed. Background Main memory is made from Dynamic RAM (DRAM), which is slow Caching is possible because of the principle of locality Spatial Locality: Data close to an access is likely to be accessed Temporal Locality: Data is likely to be accessed multiple times Caching data by this principle helps prevent accessing Main Memory Caches are high speed storage made from the fast static RAM (SRAM) Data is pulled from memory in blocks of words, which group in sets Critical Words Cache The hit rates for the dynamic critical words cache are significantly larger in comparison to the L2 cache and the critical words cache Specifically, the dynamic critical words cache showed a 50.1% improvement in hit rate over the L2 cache, while the critical words cache only showed a 28.4% improvement This indicates that the dynamic critical words method of moving blocks into the cache is an effective solution towards improving cache performance In addition, the average memory access time for the dynamic critical words cache is less than those of the standard and critical words cache Here, the dynamic critical words cache showed a 22.7% improvement in average memory access time over the L2 cache, while the critical words cache only showed a 14.3% improvement Overall, the dynamic critical words cache is seen to be the most effective approach towards achieving the ideal L2 cache results Sometimes, the same word in a block is always accessed first The same index in an array, the same method in a class, etc. Only store the 2 words needed first, the critical words By only storing 2 words of 8, the cache effectively grows by factor of 4 When requested, the critical words are sent to the processor While the critical words are processed, the rest of the block is fetched Ideally, the rest of the block arrives before it is needed A penalty is incurred if the critical words are mispredicted Same effect as a normal cache miss Dynamic Critical Words Cache Further Plans The results indicate that this is an effective approach. The simulation will be extended to include the dynamic resizing of the critical words portion against the whole block portion. This will allow for greater flexibility against workloads that are overwhelmingly predictable or unpredictable. Furthermore, other factors must also be measured. Of particular importance is energy consumption. In the critical words scheme, memory is accessed for every read. This causes a substantial increase in power consumption that must be evaluated. To collect this data, a model of the cache will need to be simulated in Verilog. Critical words are not always predictable Split L2 into two parts: Critical word cache for predictable data Traditional, whole block cache for unpredictable data Assume new data is predictable (predictable bit is 0) If successful, keep predictable bit as “0” If not successful, set predictable bit to “1”, move to whole block side Store predictable bit with data in all higher levels of memory hierarchy Acknowledgements Thank you to our mentor, Amirhossein Mirhosseininiri and the PURE Committee for providing us with this opportunity. References Huang, Cheng-Chieh, and Vijay Nagarajan. "Increasing cache capacity via critical-words-only cache." Computer Design (ICCD), 2014 32nd IEEE International Conference on. IEEE, 2014.