EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007

Slides:



Advertisements
Similar presentations
361 Computer Architecture Lecture 15: Cache Memory
Advertisements

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Chapter 7 Cache Memories.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
361 Computer Architecture Lecture 14: Cache Memory
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
CMPE 421 Parallel Computer Architecture
Memory/Storage Architecture Lab Computer Architecture Memory Hierarchy.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
EEL-4713 Ann Gordon-Ross 1 EEL-4713 Computer Architecture Memory hierarchies.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Computer Organization & Programming
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CMSC 611: Advanced Computer Architecture
CSE 351 Section 9 3/1/12.
The Goal: illusion of large, fast, cheap memory
Improving Memory Access 1/3 The Cache and Virtual Memory
Multilevel Memories (Improving performance using alittle “cash”)
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Lecture 21: Memory Hierarchy
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
CS61C : Machine Structures Lecture 6. 2
Lecture 23: Cache, Memory, Virtual Memory
Systems Architecture II
Lecture 08: Memory Hierarchy Cache Performance
CPE 631 Lecture 05: Cache Design
CMSC 611: Advanced Computer Architecture
ECE232: Hardware Organization and Design
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 22: Cache Hierarchies, Memory
CS-447– Computer Architecture Lecture 20 Cache Memories
Lecture 21: Memory Hierarchy
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Memory & Cache.
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007

Today’s Menu: Announcements Cache Review Intro Cache organizations Mechanics (Index, Tag, etc.) Design Choices Examples HW Hints

Review: The Memory Problem We need: Big, fast, cheap memory But: Big memories are slow Even when built from fast components Fast memories are expensive and small

We Are Lucky: Programs Have Locality! Principle of Locality Programs access a relatively small portion of the address space at any given time Can tell what memory locations a program will reference in the future by looking at what it referenced recently in the past Two Types of Locality Temporal Locality - If an item has been referenced recently, it will tend to be referenced again soon Spatial Locality - If an item has been referenced recently, nearby items will tend to be referenced soon Nearby refers to memory addresses

The Solution Memory can be arranged as hierarchies The goal is to provide the illusion of lots of fast memory But how do you manage this, and make it work? Processor Control Memory Memory Memory Datapath Memory Memory Speed: Fastest Slowest Size: Smallest Biggest Cost: Highest Lowest

Designing Caches Organization: Direct Mapped Set Associative Fully Associative Design Choices: Block size Replacement Policy Write back/Write through Write Miss/Fetch Policy Others: consistency, etc?

Direct Mapped Cache Memory Address Cache Index 1 1 2 2 3 3 4 5 1 1 2 2 3 3 4 5 4 Word Direct Mapped Cache 6 7 8 9 A B C D E F 16 Word Memory

Set Associative Cache Memory Address Cache Index 1 1 1 2 3 4 5 1 1 1 2 3 4 5 4 Word 2 Way Set Associative Cache 6 7 8 9 A B C D E F 16 Word Memory

Fully Associative Cache No Cache Index Memory Address 1 2 3 4 5 6 7 8 9 A 4 Word Fully Associative Cache B C Complete Freedom More Complex Replacement Policy and HW No Memory partitioning D E F 16 Word Memory

Direct Mapped Cache (block size=2) Memory Address 1 Cache Index 2 3 1 4 5 6 7 8 9 A B C D E F 16 Word Memory

Quick Example Direct mapped cache with 16 KB of data and 4-word blocks. 32-bit addresses How big is the entire cache?

Cache Tag & Index : : : Assume a 32 bit memory address Assume we also have a 2n word direct mapped cache with 1 word blocks 31 n+2 2 1 Cache Tag : (=0x50) Cache Index (=3) Byte Offset Valid Bit Tag Data Word 0 Word 1 1 Word 2 2 0x50 Word 3 3 2 n Words : : : Word 2n -1 2 - 1 n

Cache Blocks Previous example was 4 word Direct Mapped Cache Each block was 1 word wide Strategy took advantage of temporal locality since if a word is referenced, it will tend to be referenced soon Did not take advantage of spatial locality To take advantage of spatial locality, increase block size Valid Cache Tag Cache Data word 0 word 1 word 2 word 3 word 4 word 5 word 6 word 7

Cache Block Example Assume a 2n byte direct mapped cache with 2m byte blocks (word size = 1 Byte) Byte select – The lower m bits Cache index - The lower (n-m) bits of the memory address Cache tag - The upper (32-n-m) bits of the memory address 31 9 4 1 KB Cache 32 B Blocks Cache Tag Cache Index Byte Select 0x50 0x01 0x1F 5 5 Valid Cache Tag Cache Data Byte0 : Byte 2 Byte 31 0x50 1 Byte32 : Byte62 Byte 63 2 3 25 = 32 cache lines : : : Byte 992 : 31 Byte 1023

Increased Miss Penalty Block Sizes Larger block sizes take advantage of spatial locality Also incurs larger miss penalty since it takes longer to transfer the block into the cache Large block can also increase the average time or the miss rate Tradeoff in selecting block size Average Access Time = Hit Time + Miss Penalty × MR Average Access Time Miss Penalty Miss Rate Exploits Spatial Locality Increased Miss Penalty & Miss Rate Fewer blocks: compromises temporal locality Block Size Block Size Block Size

Fully Associative Cache Opposite extreme in that it has no cache index Use any available entry to store memory elements No conflict misses, only capacity misses Must compare cache tags of all entries in parallel to find the desired one 31 4 Cache Tag (27 bits long) Byte Select 0x1E Cache Tag Valid Cache Data = Byte 01 : Byte 30 Byte 31 = Byte 32 : Byte 62 Byte 63 = = : : : =

Replacement Policies Least Recently Used (LRU) Often Not Recently Used works pretty well and is easier to implement Random Round Robin

Write Policy Write-through Write-back Misses are simpler and cheaper since block does not need to be written back Consistency is easy Easier to implement, though most systems need an additional buffer, called a write buffer, to be practical Uses a lot of bandwidth to the next level of memory Potentially horrible performance Write-back Words can be written at the cache rate Multiple writes within a block require only one “writeback” later 2 cycle writes

Write Miss/Fetch Policies On a write miss, do we load the cache line in the cache? Yes! – “Write Allocate” Fetch-on-write (write through caches, write back caches) Fetch the rest of the block No-fetch-on-write (write through caches) Mark the parts of the block that are not valid No! – “No Write Allocate” No-write-allocate (write through caches) Write data to directly memory, without keeping a copy to cache

Other Topics Split caches Multilevel caches - The goal is to provide the illusion of lots of fast memory Processor Control Memory Memory Memory Datapath Memory Memory Speed: Fastest Slowest Size: Smallest Biggest Cost: Highest Lowest

Example: What happens to L1 cache? Hit Time Miss Rate Miss Penalty Larger L1 cache Higher associativity Larger blocks Multilevel caches

Terminology Block – Minimum unit of information transfer between levels of the hierarchy Block addressing varies by technology at each level Blocks are moved one level at a time Hit – Data appears in a block in upper level Hit rate – Percent of accesses found Hit time – Time to access at upper level Hit time = Access time + Time to determine hit/miss Miss – Data was not in upper level and had to be fetched from a lower level Miss rate – Percent of misses (1 - Hit rate) Miss penalty – Overhead in getting data from a lower level Miss penalty = Lower level access time + Replacement time + Time to deliver to upper level Miss penalty is usually much larger than the hit time

AMAT = Hit Time + Miss Rate * Miss Penalty Need to define an average access time (some will be fast, some will be slow) This formula can be applied recursively: AMATL1 = HitTimeL1 + MissRateL1 * AMATL2 AMATL2 = HitTimeL2 + MissRateL2 * AMATMAIN MEM How do you compute CPIs given AMAT AMAT is in unit of time (i.e. usually ns) Converting AMAT to CPI  AMAT x Clock Rate x Frequency of AMAT CPIoverall = CPIbase + Clock Rate x Frequency of L1 accesses x AMATL1