Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

Slides:



Advertisements
Similar presentations
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
Advertisements

July 2005Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design.
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
6/12/2015Page 1 Exploiting Memory Hierarchy Chapter 7 B.Ramamurthy.
How caches take advantage of Temporal locality
1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )
11/2/2004Comp 120 Fall November 9 classes to go! VOTE! 2 more needed for study. Assignment 10! Cache.
Modified from notes by Saeid Nooshabadi
CS61C L22 Caches III (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #22: Caches Andy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Caches The principle that states that if data is used, its neighbor will likely be used soon.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
11/3/2005Comp 120 Fall November 10 classes to go! Cache.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
2/27/2002CSE Cache II Caches, part II CPU On-chip cache Off-chip cache DRAM memory Disk memory.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
CMPE 421 Parallel Computer Architecture
Lecture Objectives: 1)Define set associative cache and fully associative cache. 2)Compare and contrast the performance of set associative caches, direct.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Additional Slides By Professor Mary Jane Irwin Pennsylvania State University Group 1.
Cache Organization 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
CSE 351 Caches. Section Feedback Before we begin, we’d like to get some feedback about section If you could answer the following questions on the provided.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
نظام المحاضرات الالكترونينظام المحاضرات الالكتروني Cache Memory.
Fall EE 333 Lillevik 333f06-l16 University of Portland School of Engineering Computer Organization Lecture 16 Write-through, write-back cache Memory.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Improving Memory Access The Cache and Virtual Memory
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
CSE 351 Section 9 3/1/12.
The Goal: illusion of large, fast, cheap memory
Improving Memory Access 1/3 The Cache and Virtual Memory
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Morgan Kaufmann Publishers Memory & Cache
Morgan Kaufmann Publishers
CSCI206 - Computer Organization & Programming
Lecture 08: Memory Hierarchy Cache Performance
Lecture 22: Cache Hierarchies, Memory
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
ECE232: Hardware Organization and Design
Part V Memory System Design
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Cache Memory and Performance
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others. Works well if we reuse data blocks Examples: Incrementing a variable Loops Function calls

Why do caches work Locality principles Temporal locality Location of memory reference is likely to be the same as another recent reference. Variables are reused in program Loops, function calls, etc. Spacial locality Location of memory is likely to be near another recent reference Matrices, arrays Stack accesses

Cache performance example Problem (let’s assume single cycle CPU) 500 MHz CPU cycle time = 2 ns Instructions: arithmetic 50%, load/store 30%, branch 20%. Cache: hit rate: 95%, miss penalty: 60 ns (or 30 cycles), hit time: 2 ns (or 1 cycle) MIPS CPI w/o cache for load/store: 0.5 * * * 30 = 9.7 MIPS CPI with cache for load/store: 0.5 * * * (.95* *30) = 1.435

Cache types Direct-mapped Memory location maps to single specific cache line (block) What if two locations map to same line (block)? Conflict, forces a miss Set-associative Memory location maps to a set containing several blocks. Each block still has tag and data, and sets can have 2,4,8,etc. blocks. Blocks/set = associativity Why? Resolves conflicts in direct-mapped caches. If two locations map to same set, one could be stored in first block of the set, and another in second block of the set. Fully-associative Cache only has one set. All memory locations map to this set. This one set has all the blocks, and a given location could be in any of these blocks No conflict misses, but costly. Only used in very small caches.

Direct-mapped cache example 4 KB cache, each block is 32 bytes How many blocks? How long is the index to select a block? How long is the offset (displacement) to select a byte in block? How many bits left over if we assume 32-bit address? These bits are tag bits

Direct-mapped cache example 4 KB cache, each block is 32 bytes 4 KB = 2 12, 32 = 2 5 How many blocks? 2 12 bytes / 2 5 bytes in block = 2 7 = 128 blocks How long is the index to select a block? log = 7 bits How long is the offset (displacement) to select a byte in block? 5 bits How many bits left over if we assume 32-bit address? These bits are tag bits 32 – 7 – 5 = 20 bits

Example continued Address and cache: (get notes from someone for complete pic) 20-bit tag7-bit index5-bit offset

Cache size 4 KB visible size Let’s look at total space and overhead: Each block contains: 1 valid bit 20-bit tag 32 bytes of data = 256 bits Total block (line) size: = 277 bits Total cache size in hardware, including overhead storage: 277 bits * 128 blocks = bits = 4432 bytes = 4.32 Kb Overhead: 0.32 Kb (336 bytes) for valid bits and tags

Cache access examples… Consider a direct-mapped cache with 8 blocks and 2-byte block. Total size = 8 * 2 = 16 bytes Address: 1 bit for offset/displacement, 3 bits for index, rest for tag Consider a stream of reads to these bytes: These are byte addresses: 3, 13, 1, 0, 5, 1, 4, 32, 33, 1 Corresponding block addresses ((byteaddr/2)%8): 1, 6, 0, 0, 2, 0, 2, 0 (16%8), 0, 0. Tags: 2 for 32, 33, 0 for all others ((byteaddr/2)/8). Let’s look at what this looks like. How many misses? What if we increase associativity to 2? Will have 4 sets, 2 blocks in each set, still 2 bytes in each block. Total size still 16 bytes. How does behavior change?... (get notes from someone for the drawings)