EEL-4713 Ann Gordon-Ross 1 EEL-4713 Computer Architecture Memory hierarchies.

Slides:



Advertisements
Similar presentations
361 Computer Architecture Lecture 15: Cache Memory
Advertisements

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
ECE 232 L26.Cache.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 26 Caches.
Now, Review of Memory Hierarchy
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Chapter 7 Cache Memories.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
361 Computer Architecture Lecture 14: Cache Memory
CIS629 - Fall 2002 Caches 1 Caches °Why is caching needed? Technological development and Moore’s Law °Why are caches successful? Principle of locality.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
CMPE 421 Parallel Computer Architecture
Memory/Storage Architecture Lab Computer Architecture Memory Hierarchy.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
CPE 442 cache.1 Introduction To Computer Architecture CpE 442 Cache Memory Design.
1 Computer Architecture Cache Memory. 2 Today is brought to you by cache What do we want? –Fast access to data from memory –Large size of memory –Acceptable.
Memory Hierarchy— Motivation, Definitions, Four Questions about Memory Hierarchy, Improving Performance Professor Alvin R. Lebeck Computer Science 220.
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Computer Organization & Programming
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Improving Memory Access 2/3 The Cache and Virtual Memory
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
Cps 220 Cache. 1 ©GK Fall 1998 CPS220 Computer System Organization Lecture 17: The Cache Alvin R. Lebeck Fall 1999.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
CMSC 611: Advanced Computer Architecture
The Goal: illusion of large, fast, cheap memory
Cache Memory Presentation I
Mary Jane Irwin ( ) CSE 431 Computer Architecture Fall 2005 Lecture 19: Cache Introduction Review Mary Jane Irwin (
EEL-4713 Computer Architecture Memory hierarchies
COMP 206 Siddhartha Chatterjee Fall 2000
CPE 631 Lecture 05: Cache Design
CMSC 611: Advanced Computer Architecture
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Cache Memory Rabi Mahapatra
CPE 631 Lecture 04: Review of the ABC of Caches
CS Computer Architecture Spring Lecture 19: Cache Introduction
Memory & Cache.
COMP 206 Siddhartha Chatterjee Fall 2000
10/18: Lecture Topics Using spatial locality
Presentation transcript:

EEL-4713 Ann Gordon-Ross 1 EEL-4713 Computer Architecture Memory hierarchies

EEL-4713 Ann Gordon-Ross 2 The Motivation for Caches °Motivation: Large memories (DRAM) are slow Small memories (SRAM) are fast °Make the average access time small by: Servicing most accesses from a small, fast memory. °Reduce the bandwidth required of the large memory Processor Memory System Cache DRAM

EEL-4713 Ann Gordon-Ross 3 Outline of Today’s Lecture °Introduction to Memory Hierarchies & caches °Cache write and replacement policies

EEL-4713 Ann Gordon-Ross 4 *An Expanded View of the Memory System Control Datapath Memory Processor Memory Fastest Slowest Smallest Biggest Highest Lowest Speed: Size: Cost:

EEL-4713 Ann Gordon-Ross 5 The Principle of Locality Address Space0 memsize Probability of reference What are the principles of Locality?

EEL-4713 Ann Gordon-Ross 6 The Principle of Locality °The Principle of Locality: Program access a relatively small portion of the address space at any instant of time. Example: 90% of time in 10% of the code °Two Different Types of Locality: Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon. Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon. Address Space0 memsize Probability of reference

EEL-4713 Ann Gordon-Ross 7 Memory Hierarchy: Principles of Operation °2-level hierarchy example °At any given time, data is copied between only 2 adjacent levels: Upper Level (Cache) : the one closer to the processor -Smaller, faster, and uses more expensive technology Lower Level (Memory): the one further away from the processor -Bigger, slower, and uses less expensive technology °Block: The minimum unit of information that can either be present or not present in the two-level hierarchy Lower Level (Memory) Upper Level (Cache) To Processor From Processor Blk X Blk Y

EEL-4713 Ann Gordon-Ross 8 Memory Hierarchy: Terminology °Hit: data appears in some block in the upper level (example: Block X) Hit Rate: the fraction of memory accesses found in the upper level Hit Time: Time to access the upper level which consists of RAM access time + Time to determine hit/miss °Miss: data needs to be retrieved from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty = Time to replace a block in the upper level + Time to deliver the block the processor °Hit Time << Miss Penalty Lower Level (Memory) Upper Level (Cache) To Processor From Processor Blk X Blk Y

EEL-4713 Ann Gordon-Ross 9 Basic Terminology: Typical Values Typical Values Block (line) size bytes Hit time1 - 4 cycles Miss penalty cycles (and increasing) Miss rate1% - 20% Cache Size64 KB - 8 MB

EEL-4713 Ann Gordon-Ross 10 How Do Caches Work? °Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon. Keep more recently accessed data items closer to the processor °Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon. Move blocks consisting of contiguous words to the cache Lower Level Memory Upper Level Cache To Processor From Processor Blk X Blk Y

EEL-4713 Ann Gordon-Ross 11 The Simplest Cache: Direct-Mapped Cache Memory 4 Byte Direct Mapped Cache Memory Address A B C D E F Cache Index °Location 0 can be occupied by data from: Memory location 0, 4, 8,... etc. In general: any memory location whose 2 LSBs of the address are 0s Address => cache index °How can we tell which block is in the cache? Valid bit Tag

EEL-4713 Ann Gordon-Ross 12 Cache Tag and Cache Index °Assume a 32-bit memory (byte) address: A 2^N bytes direct mapped cache: -Cache Index: The lower N bits of the memory address -Cache Tag: The upper (32 - N) bits of the memory address Cache Index N : 2 N Bytes Direct Mapped Cache Byte 0 Byte 1 Byte 2 Byte 3 Byte 2**N -1 0N31 : Cache TagExample: 0x50Ex: 0x03 0x50 Stored as part of the cache “ state ” Valid Bit :

EEL-4713 Ann Gordon-Ross 13 Cache Access Example Access Start Up 000M [00001] Access (miss) 000M [00001] 010M [01010] TagDataV 000M [00001] 010M [01010] Miss Handling: Load Data Write Tag & Set V Load Data Write Tag & Set V Access (HIT) 000M [00001] 010M [01010]Access (HIT) °Caches begin empty: A lot of misses at start up: Compulsory Misses -(Cold start misses)

EEL-4713 Ann Gordon-Ross 14 Definition of a Cache Block °Cache Block (cache “line”): cache data that has its own cache tag °Our previous “extreme” example: 4-byte Direct Mapped cache: Block Size = 1 Byte Takes advantage of Temporal Locality: If a byte is referenced, it will tend to be referenced soon. Did not take advantage of Spatial Locality: If a byte is referenced, its adjacent bytes will be referenced soon. °In order to take advantage of Spatial Locality: increase the block size Direct Mapped Cache Data Byte 0 Byte 1 Byte 2 Byte 3 Cache TagValid

EEL-4713 Ann Gordon-Ross 15 Example: 1 KB Direct Mapped Cache with 32 B Blocks °For a 2 ** N byte cache: The uppermost (32 - N) bits are always the Cache Tag The lowest M bits are the Byte Select (Block Size = 2 ** M) Cache Index : Cache Data Byte : Cache TagExample: 0x50 Ex: 0x01 0x50 Stored as part of the cache “ state ” Valid Bit : 31 Byte 1Byte 31 : Byte 32Byte 33Byte 63 : Byte 992Byte 1023 : Cache Tag Byte Select Ex: 0x00 9

EEL-4713 Ann Gordon-Ross 16 Block size tradeoffs °Larger block helps with spatial locality °However, transferring a large block from memory to cache takes longer than transferring a small block “miss penalty” °Important to understand implication of block sizes in the average time to service a memory request °Average memory access time (AMAT) Hit_time * (1 – miss_rate) + miss_rate*miss_penalty °Cache design goals: Short hit time Small miss rate Short miss penalty

EEL-4713 Ann Gordon-Ross 17 Components of miss penalty Control Datapath Memory Processor Memory Memory latency: depends on memory size, technology Transfer time: (block size/bus width)*bus cycle Penalty  latency + transfer time

EEL-4713 Ann Gordon-Ross 18 *Block Size Tradeoff °In general, larger block size take advantage of spatial locality BUT: Larger block size means larger miss penalty: -Takes longer time to fill up the block If block size is too big relative to cache size, miss rate will go up °Average Access Time: = Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate Miss Penalty Block Size Miss Rate Exploits Spatial Locality Fewer blocks: compromises temporal locality Average Access Time Increased Miss Penalty & Miss Rate Block Size

EEL-4713 Ann Gordon-Ross 19 Another Extreme Example °Cache Size = 4 bytesBlock Size = 4 bytes Only ONE entry in the cache °If an item is accessed, likely that it will be accessed again soon But it is unlikely that it will be accessed again immediately!!! The next access will likely to be a miss again -Continually loading data into the cache but discard (force out) them before they are used again °Conflict Misses are misses caused by: Different memory locations mapped to the same cache index -Solution 1: make the cache size bigger -Solution 2: Multiple entries for the same Cache Index 0 Cache DataValid Bit Byte 0Byte 1Byte 3 Cache Tag Byte 2

EEL-4713 Ann Gordon-Ross 20 Outline °Set-associative caches °Simulation experiments °Replacement and write policies

EEL-4713 Ann Gordon-Ross 21 *Our original direct-mapped cache °Implementation diagram Cache Data Cache Block 0 Cache TagValid ::: Cache Index Cache Block Compare Adr Tag Hit Cache IndexCache Tag Address of a lw/sw

EEL-4713 Ann Gordon-Ross 22 A Two-way Set Associative Cache °N-way set associative: N entries for each Cache Index N direct mapped caches operate in parallel °Example: Two-way set associative cache Cache Index selects a “set” from the cache The two tags in the set are compared in parallel Data is selected based on the tag result Cache Data Cache Block 0 Cache TagValid ::: Cache Data Cache Block 0 Cache TagValid ::: Cache Index Mux 01 Sel1Sel0 Cache Block Compare Adr Tag Compare OR Hit

EEL-4713 Ann Gordon-Ross 23 Disadvantage of Set Associative Cache °N-way Set Associative Cache versus Direct Mapped Cache: N comparators vs. 1 Extra MUX delay for the data – critical path! Data comes AFTER Hit/Miss °In a direct mapped cache, Cache Block is available BEFORE Hit/Miss: Possible to assume a hit and continue. Recover later if miss. Cache Data Cache Block 0 Cache TagValid ::: Cache Data Cache Block 0 Cache TagValid ::: Cache Index Mux 01 Sel1Sel0 Cache Block Compare Adr Tag Compare OR Hit

EEL-4713 Ann Gordon-Ross 24 Critical path: direct-mapped cache Cache Data Cache Block 0 Cache TagValid ::: Cache Index Cache Block Compare Adr Tag Hit Cache IndexCache Tag Address of a lw/sw Tag delay Data delay

EEL-4713 Ann Gordon-Ross 25 Critical path: Set Associative Cache Cache Data Cache Block 0 Cache TagValid ::: Cache Data Cache Block 0 Cache TagValid ::: Cache Index Mux 01 Sel1Sel0 Cache Block Compare Adr Tag Compare OR Hit Tag delay Data delay

EEL-4713 Ann Gordon-Ross 26 Examples °Cacti simulation Same size, block, feature size Cycle times: Direct-mapped versus 2-way -DM: ns -2-way: ns °Simplescalar simulation Hit rates: direct mapped versus 2-way °Impact on Average Memory Access Time?

EEL-4713 Ann Gordon-Ross 27 Simplescalar simulation °Direct-mapped °il1.accesses # total number of accesses °il1.hits # total number of hits °dl1.accesses # total number of accesses °dl1.hits # total number of hits °2-way set associative °il1.accesses # total number of accesses °il1.hits # total number of hits °dl1.accesses # total number of accesses °dl1.hits # total number of hits

EEL-4713 Ann Gordon-Ross 28 A Summary on Sources of Cache Misses °Compulsory (cold start, first reference): first access to a block “Cold” fact of life: not a whole lot you can do about it °Conflict (collision): Multiple memory locations mapped to the same cache location Solution 1: increase cache size Solution 2: increase associativity °Capacity: Cache cannot contain all blocks access by the program Solution: increase cache size °Invalidation: other process (e.g., I/O, other CPU) updates memory

EEL-4713 Ann Gordon-Ross 29 And yet Another Extreme Example: Fully Associative °Fully Associative Cache -- push the set associative idea to its limit! Forget about the Cache Index Compare the Cache Tags of all cache entries in parallel Example: Block Size = 2 B blocks, we need N 27-bit comparators °By definition: Conflict Miss = 0 for a fully associative cache : Cache Data Byte : Cache Tag (27 bits long) Valid Bit : Byte 1Byte 31 : Byte 32Byte 33Byte 63 : Cache Tag Byte Select Ex: 0x01 X X X X X

EEL-4713 Ann Gordon-Ross 30 Making room for a new block °Direct Mapped Cache: Each memory location can only mapped to 1 cache location No need to make any decision -Current item replaced the previous item in that cache location °N-way Set Associative Cache: Each memory location have a choice of N cache locations °Fully Associative Cache: Each memory location can be placed in ANY cache location °Cache miss in a N-way Set Associative or Fully Associative Cache: Bring in new block from memory Throw out a cache block to make room for the new block Need to make a decision on which block to throw out!

EEL-4713 Ann Gordon-Ross 31 Cache Block Replacement Policy °Random Replacement: Hardware randomly selects a cache item and throws it out °Least Recently Used: Hardware keeps track of the access history Replace the entry that has not been used for the longest time °Example of a Simple “Pseudo” Least Recently Used Implementation: Assume 64 Fully Associative Entries Hardware replacement pointer points to one cache entry Whenever an access is made to the entry the pointer points to: -Move the pointer to the next entry Otherwise: do not move the pointer : Entry 0 Entry 1 Entry 63 Replacement Pointer

EEL-4713 Ann Gordon-Ross 32 *Cache Write Policy: Write Through versus Write Back °Cache read is much easier to handle than cache write: Instruction cache is much easier to design than data cache °Cache write: How do we keep data in the cache and memory consistent? °Two options (decision time again :-) Write Back: write to cache only. Write the cache block to memory when that cache block is being replaced on a cache miss. -Need a “dirty” bit for each cache block -Greatly reduce the memory bandwidth requirement -Control can be complex Write Through: write to cache and memory at the same time. -Isn’t memory too slow for this?

EEL-4713 Ann Gordon-Ross 33 Write Buffer for Write Through °A Write Buffer is needed between the Cache and Memory Processor: writes data into the cache and the write buffer Memory controller: write contents of the buffer to memory °Write buffer is just a FIFO: Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle °However if: Store frequency > 1 / DRAM write cycle Write buffer saturation Processor Cache Write Buffer DRAM

EEL-4713 Ann Gordon-Ross 34 Write Buffer Saturation °Store frequency > 1 / DRAM write cycle If this condition exist for a long period of time (CPU cycle time too quick and/or too many store instructions in a row): -Store buffer will overflow no matter how big you make it °Solution for write buffer saturation: Use a write back cache Install a second level (L2) cache: Processor Cache Write Buffer DRAM Processor Cache Write Buffer DRAM L2 Cache

EEL-4713 Ann Gordon-Ross 35 Write Allocate versus Not Allocate °Assume: a 16-bit write to memory location 0x0 causes a miss Do we read in the rest of the block (Bytes 2, 3,... 31)? Yes: Write Allocate (usually associated w/ write-back) No: Write Not Allocate (write-through) Cache Index : Cache Data Byte : Cache TagExample: 0x00 Ex: 0x00 0x01 Valid Bit : 31 Byte 1Byte 31 : Byte 32Byte 33Byte 63 : Byte 992Byte 1023 : Cache Tag Byte Select Ex: 0x00 9

EEL-4713 Ann Gordon-Ross 36 What is a Sub-block? °Sub-block: A unit within a block that has its own valid bit Example: 1 KB Direct Mapped Cache, 32-B Block, 8-B Sub-block -Each cache entry will have: 32/8 = 4 valid bits °Write miss: only the bytes in that sub-block are brought in : Cache Data : SB0 ’ s V Bit : 31 Cache Tag SB1 ’ s V Bit : SB2 ’ s V Bit : SB3 ’ s V Bit : Sub-block0Sub-block1Sub-block2Sub-block3 : B0B7 : B24B31 Byte 992Byte 1023

EEL-4713 Ann Gordon-Ross 37 “Unified” versus “Split” caches °Unified: both instructions and data co-reside in the same cache °Split: different instruction (I) and data (D) caches °Typically, today’s on-chip caches closest to processor (L1) are split I-cache typically has better locality and can be made smaller to be faster (remember instruction fetch is in the critical path!) Separate caches avoid data blocks replacing instruction blocks °L2+ caches typically unified Much larger; chance of replacement small Not in the critical path A unified design is simpler to implement

EEL-4713 Ann Gordon-Ross 38 Multi-level caches °Close to processor: Level-1 cache Goal: maximize hit rate while keeping cycle time as close as possible to datapath Instruction fetch in a cycle; data access in a cycle °Far from (or outside) processor Level-2 (-3) caches Goal is to maximize hit rate while keeping cost (I.e. area) within a target design goal

EEL-4713 Ann Gordon-Ross 39 Example: IBM Power4 Source:

EEL-4713 Ann Gordon-Ross 40 Example (cont) Source:

EEL-4713 Ann Gordon-Ross 41 Power4 cache hierarchy °Level 1, on-chip °Power-4 has two processors on a chip Each processor has its own L1 cache Each cache is split °L1 Instruction Cache Direct mapped, 128-byte block managed as 4 32-byte sub-blocks 128 KB (64 KB per processor) °L1 Data Cache 2-way, 128-byte block 64 KB (32 KB per processor) °One 32-byte read or write per cycle

EEL-4713 Ann Gordon-Ross 42 Power4 cache hierarchy °Level 2, on-chip °Still SRAM °Shared across two processors °L2 configuration unified 8-way, 128-byte block 1.5 MB

EEL-4713 Ann Gordon-Ross 43 Power4 cache hierarchy °Level 3, off-chip Directory tags for multiprocessing on-chip °Embedded DRAM °L3 configuration 8-way, 512-byte block managed as byte sub-blocks 32 MB shared

EEL-4713 Ann Gordon-Ross 44 Power5 cache hierarchy °Improvements in size, associativity to reduce capacity/conflict misses °L1 I-cache: 2-way set associative °L1 D-cache: 4-way set associative °L2 unified cache: 1.92MB, 10-way associative, 13-cycle latency °L3 unified cache: 36MB, 12-way associative, 87-cycle latency °Memory: 220-cycle latency °

EEL-4713 Ann Gordon-Ross 45 Summary: °The Principle of Locality: Programs access a relatively small portion of the address space at any instant of time. -Temporal Locality: Locality in Time -Spatial Locality: Locality in Space °Three Major Categories of Cache Misses: Compulsory Misses: cold start misses. Conflict Misses: increase cache size and/or associativity. Capacity Misses: increase cache size °Write Policy: Write Through: need a write buffer Write Back: control can be complex