Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

Slides:



Advertisements
Similar presentations
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Advertisements

Lecture 12 Reduce Miss Penalty and Hit Time
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Performance of Cache Memory
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
1 The Memory Hierarchy Ideally one would desire an indefinitely large memory capacity such that any particular word would be immediately available … We.
11/2/2004Comp 120 Fall November 9 classes to go! VOTE! 2 more needed for study. Assignment 10! Cache.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Memory Chapter 7 Cache Memories.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
Memory: PerformanceCSCE430/830 Memory Hierarchy: Performance CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine)
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Lecture 31: Chapter 5 Today’s topic –Direct mapped cache Reminder –HW8 due 11/21/
Computing Systems Memory Hierarchy.
CMPE 421 Parallel Computer Architecture
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 CMPE 421 Parallel Computer Architecture PART3 Accessing a Cache.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Memory Hierarchy and Cache Design (3). Reducing Cache Miss Penalty 1. Giving priority to read misses over writes 2. Sub-block placement for reduced miss.
CMSC 611: Advanced Computer Architecture
Yu-Lun Kuo Computer Sciences and Information Engineering
The Goal: illusion of large, fast, cheap memory
Improving Memory Access 1/3 The Cache and Virtual Memory
CSC 4250 Computer Architectures
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Morgan Kaufmann Publishers Memory & Cache
Morgan Kaufmann Publishers
ECE 445 – Computer Organization
Systems Architecture II
Lecture 22: Cache Hierarchies, Memory
ECE232: Hardware Organization and Design
Lecture 20: OOO, Memory Hierarchy
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
CS-447– Computer Architecture Lecture 20 Cache Memories
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Cache Memory Rabi Mahapatra
Memory & Cache.
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index 16 14 2 Valid Tag Data 16K entries 16 32 = Data Hit

Write – Through Performance Improvement Every Write : Write Cache and Write Main Memory Can be 10% to 15% of instructions

Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Main Memory

Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Address Data Valid Main Memory

Write – Through Performance Improvement Consider a Write Buffer Processor Write Buffer Cache Memory Controller Writes Data from Buffer to Main and Releases Buffer Main Memory

Write – Through Performance Improvement Consider a Write Buffer Processor Write Cache and Buffer Continue until? Write Buffer Cache Memory Controller Writes Data from Buffer to Main and Releases Buffer Main Memory

Write – Through Performance Improvement Consider a Write Buffer Processor Write Cache and Buffer Continue until? Write Buffer Cache Write Buffer Full (Write Miss – HOLD) Main Memory

Write – Through Performance Improvement Consider a Write Buffer Processor Write Cache and Buffer Continue until? Write Buffer Cache Write Buffer Full (Write Miss – HOLD) 2. Read Miss Main Memory

Write – Through Performance Improvement Consider a Write Buffer Processor Write Cache and Buffer Continue until? Write Buffer Cache Write Buffer Full (Write Miss – HOLD) Read Miss Wait until Write Buffer is empty. Main Memory

Consider a Cache with a block of several adjacent words. Read Miss: Fetch a block of multiple adjacent words which replaces a block

Consider a Cache with a block of several adjacent words. Read Miss: Fetch a block of multiple adjacent words which replaces a block in cache Predicts that if a location is accessed, then the locations in the block will be used soon. ( Increased use of Spatial Locality)

Consider a Cache with a block of several adjacent words. Read Miss: Fetch a block of multiple adjacent words which replaces a block in cache Predicts that if a location is accessed, then the locations in the block will be used soon. ( Increased use of Spatial Locality) Cache Entry - 4 word block Index Valid Tag Word 3 Word 2 Word 1 Word 0

Consider a Cache with a block of several adjacent words. Read Miss: Fetch a block of multiple adjacent words which replaces a block in cache Predicts that if a location is accessed, then the locations in the block will be used soon. ( Increased use of Spatial Locality) Cache Entry - 4 word block Index Valid Tag Word 3 Word 2 Word 1 Word 0 Shared Valid and Tag more efficient use of memory

Address Tag Index 16 12 Byte Offset Block Offset 31 . . . 16 15 . . . 4 3 2 1 0 Address Tag Index 16 12 Byte Offset Block Offset

v Tag Word3 Word2 Word1 Word0 31 . . . 16 15 . . . 4 3 2 1 0 Address Tag Index 16 12 Byte Offset Block Offset v Tag Word3 Word2 Word1 Word0 4K Entries (Blocks)

Address Tag Index 16 12 Byte Offset Block Offset 4K Entries 16 = Hit 31 . . . 16 15 . . . 4 3 2 1 0 Address Tag Index 16 12 Byte Offset Block Offset v Tag Word3 Word2 Word1 Word0 4K Entries 16 = Hit

v Tag Word3 Word2 Word1 Word0 31 . . . 16 15 . . . 4 3 2 1 0 Address Tag Index 16 12 Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 32 32 32 32 4K Entries 16 = Mux Hit Data 32

Consider this 4K ( 4096 ) Entry Cache with a block of 4 words or 16 bytes. For address of 131408, what is the block number? Block Number = Address of Cache = Index

Consider this 4K ( 4096 ) Entry Cache with a block of 4 words or 16 bytes. For address of 131408, what is the block number? Block Number = Address of Cache = Index Address = 131408 ( byte ) Block address =

Consider this 4K ( 4096 ) Entry Cache with a block of 4 words or 16 bytes. For address of 131408, what is the block number? Block Number = Address of Cache Address = 131408 ( byte ) Block address = 131408 / 16 bytes/block = 8213 ( left 28 bits of address)

Consider this 4K ( 4096 ) entry Cache with a block of 4 words or 16 bytes. For address of 131408, what is the block number? Block Number = Address of Cache Address = 131408 ( byte ) Block address = 131408 / 16 bytes/block = 8213 ( left 28 bits of address) Block Number = (Block Addr) modulo(No. of cache blocks)

Consider this 4K ( 4096 ) entry Cache with a block of 4 words or 16 bytes. For address of 131408, what is the block number? Block Number = Address of Cache Address = 131408 ( byte ) Block address = 131408 / 16 bytes/block = 8213 ( left 28 bits of address) Block Number = (Block Addr) modulo(No. of cache blocks) 8213 -4096 4117 21

131408 ( byte ) 8213 ( block) Tag Index Address 16 12 Byte Offset 31 . . . 16 15 . . . 4 3 2 1 0 131408 ( byte ) 8213 ( block) Tag Index Address 16 12 Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 32 32 32 32 4K Entries 21 16 = Mux Hit Data 32

v Tag Word3 Word2 Word1 Word0 READ 31 . . . 16 15 . . . 4 3 2 1 0 Tag Index Address 16 12 Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 32 32 32 32 4K Entries 16 = Mux Hit Data 32

v Tag Word3 Word2 Word1 Word0 READ MISS Load Cache with 4 Words, Tag and Valid 31 . . . 16 15 . . . 4 3 2 1 0 Tag Index Address 16 12 Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 32 32 32 32 4K Entries 16 = Mux Hit Data 32

v Tag Word3 Word2 Word1 Word0 WRITE WORD 31 . . . 16 15 . . . 4 3 2 1 0 Tag Index Address 16 12 Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 32 32 32 32 4K Entries 16 = Mux Hit Data 32

Write Word for Multiword Cache Block ( Write-Through) Procedure: Write the Data Word to cache and compare Tags If Hit, done. Go to 4

Write Word for Multiword Cache Block ( Write-Through) Procedure: Write the Data Word to cache and compare Tags If Hit, done. Go to 4 If not Hit, ( Write Miss) Load block from Main Memory to Cache Write Data Word to cache

Write Word for Multiword Cache Block ( Write-Through) Procedure: Write the Data Word to cache and compare Tags If Hit, done. Go to 4. If not Hit, ( Write Miss) Load block from Main Memory to Cache Write Data Word to cache 4. Write the Data Word to Main Memory

Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Miss Penalty Miss Rate Block Size Block Size Constant Size Cache

Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Transfer Time Miss Penalty Miss Rate Access Time Block Size Block Size Constant Size Cache

Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Transfer Time Miss Penalty Miss Rate Access Time Block Size Block Size Constant Size Cache

Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Transfer Time Miss Penalty Miss Rate Fewer Blocks Access Time Block Size Block Size Constant Size Cache

Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Average Access Time Block Size

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3% 0.6% 0.4% gcc spice Write Misses included in 4 word block, but not in 1 word.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3% 0.6% 0.4% gcc spice Write Misses included in 4 word block, but not in 1 word. Remember Miss Penalty goes UP !

Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Transfer Time Miss Penalty Miss Rate Fewer Blocks Access Time Block Size Block Size Constant Size Cache

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block.

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart”

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart” Access and transfer each word sequentially. As soon as the requested word is in cache, restart the processor to access cache and finish the block transfer while the cache is available.

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart” Access and transfer each word sequentially. As soon as the requested word is in cache, restart the processor to access cache and finish the block transfer while the cache is available. Variation: “Requested Word First”

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Don’t wait for the complete block to be transferred “Early Restart” Access and transfer each word sequentially. As soon as the requested word is in cache, restart the processor to access cache and finish the block transfer while the cache is available. Variation: “Requested Word First” Disadvantage: Complex Control Likely access cache block before transfer is complete

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Assume Memory Access times: 1 clock cycle to send address 10 Clock cycles to access DRAM 1 clock cycle to send a word of data

Reducing the Miss Penalty Reduce the time to read the multiple words from Main Memory to the cache block. Assume Memory Access times: 1 clock cycle to send address 10 Clock cycles to access DRAM 1 clock cycle to send a word of data For sequential transfer of 4 data words: Miss Penalty = 1 + 4 *( 10 +1) = 45 clock cycles

What if we could read a block of words simultaneously from the Main Memory? Cache Entry Tag Word3 Word2 Word1 Word0 Valid 32 32 32 32 Main Memory

What if we could read a block of words simultaneously from the Main Memory? Cache Entry Tag Word3 Word2 Word1 Word0 Valid 32 32 32 32 Main Memory Miss Penalty = 1 + 10 + 1 = 12 clock cycles Miss Penalty for Sequential = 45 clock cycles

What about 4 banks of Memory? “Interleaved Memory” Cache Banks are accessed in parallel Words are transferred serially Address Bank 3 Bank 2 Bank 1 Bank 0

What about 4 banks of Memory? “Interleaved Memory” Cache Banks are accessed in parallel Words are transferred serially Address Bank 3 Bank 2 Bank 1 Bank 0 Miss Penalty = 1 + 10 + 4 * 1 = 15 clock cycles Miss Penalty for Parallel = 12 clock cycles Miss Penalty for Sequential = 45 clock cycles

Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty Increase Cache size Increase Block size Main Memory Organization Average Access Time Block Size

CPU Performance with Cache Memory For a program: CPU time = CPU execution time + CPU Hold time Assuming no penalty for Hit

CPU Performance with Cache Memory For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time Assuming no penalty for Hit

CPU Performance with Cache Memory For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time Memory Stall Clock Cycles = Read Stall Cycles + Write Stall Cycles Assuming no penalty for Hit

CPU Performance with Cache Memory For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time Memory Stall Clock Cycles = Read Stall Cycles + Write Stall Cycles Read Stall Cycles = Reads * Read Miss Rate * Read Miss Penalty Program Assuming no penalty for Hit

CPU Performance with Cache Memory Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program + Write Buffer Stalls

CPU Performance with Cache Memory Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program + Write Buffer Stalls Write Buffer Stalls should be << Write Miss Stalls

CPU Performance with Cache Memory Write Stall Cycles = Writes * Write Miss Rate * Write Miss Penalty Program + Write Buffer Stalls Write Buffer Stalls should be << Write Miss Stalls So, approximately,

CPU Performance with Cache Memory Memory Stall Clock Cycles = Read Stall Cycles + Write Stall Cycles = Reads * Read Miss Rate * Read Miss Penalty Program + Writes * Write Miss Rate * Write Miss Penalty

CPU Performance with Cache Memory Memory Stall Clock Cycles = Read Stall Cycles + Write Stall Cycles = Reads * Read Miss Rate * Read Miss Penalty Program + Writes * Write Miss Rate * Write Miss Penalty The Miss Penalties are approximately the same ( Fetch the Block) So, combining the Reads and Writes together into a weighted Miss Rate Memory Stall Cycles = Memory Accesses * Miss Rate * Miss Penalty Program

CPU Performance with Cache Memory For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time CPU time = CPU execution time + Memory Accesses * Miss Rate * Miss Penalty* Clock Cycle time Program Assuming no penalty for Hit

CPU Performance with Cache Memory For a program: CPU time = CPU execution time + CPU Hold time CPU Hold time = Memory Stall Clock Cycles * Clock Cycle time CPU time = CPU execution time + Memory Accesses * Miss Rate * Miss Penalty* Clock Cycle time Program Dividing both sides by Instructions / Program and Clock Cycle time Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Assuming no penalty for Hit

CPU Performance with Cache Memory Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 1.) Sequential Memory : Miss penalty = 65 clock cycles 2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction

CPU Performance with Cache Memory Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 1.) Sequential Memory : Miss penalty = 65 clock cycles 2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Eff CPI = 1.2 + ( 1 * .003 + .09 * .006) Miss Penalty = 1.2 + .00354 * Miss Penalty

CPU Performance with Cache Memory Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 1.) Sequential Memory : Miss penalty = 65 clock cycles 2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Eff CPI = 1.2 + ( 1 * .003 + .09 * .006) Miss Penalty = 1.2 + .00354 * Miss Penalty 1.) Eff CPI = 1.2 + .00354* 65 = 1.2 + .2301 = 1.43

CPU Performance with Cache Memory Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 1.) Sequential Memory : Miss penalty = 65 clock cycles 2.) 4 Bank Interleaved: Miss penalty = 20 clock cycles Effective CPI = Execution CPI + Memory Accesses * Miss Rate * Miss Penalty Instruction Eff CPI = 1.2 + ( 1 * .003 + .09 * .006) Miss Penalty = 1.2 + .00354 * Miss Penalty 1.) Eff CPI = 1.2 + .00354* 65 = 1.2 + 0.2301 = 1.43 2.) Eff CPI = 1.2 + .00354 * 20 = 1.2 + 0.071 = 1.271

CPU Performance with Cache Memory Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 4 Bank Interleaved: Miss penalty = 20 clock cycles Eff CPI = 1.271 clock cycles What if we get a new processor and cache that runs at twice the clock frequency, but keep the same main memory speed?

CPU Performance with Cache Memory Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 4 Bank Interleaved: Miss penalty = 20 clock cycles Eff CPI = 1.271 clock cycles What if we get a new processor and cache that runs at twice the clock frequency, but keep the same main memory speed? Miss penalty = 40 clock cycles Eff CPI = 1.2 +.00354 * 40 = 1.2 + 0.1416 = 1.342

CPU Performance with Cache Memory Consider the DECStation 3100 with 4 word blocks running spice CPI = 1.2 without misses Instruction Miss Rate = 0.3% Data Miss Rate = 0.6%, For spice, frequency of loads and stores = 9% 4 Bank Interleaved: Miss penalty = 20 clock cycles Eff CPI = 1.271 clock cycles What if we get a new processor and cache that runs at twice the clock frequency, but keep the same main memory speed? Miss penalty = 40 clock cycles Eff CPI = 1.2 +.00354 * 40 = 1.2 + 0.1416 = 1.342 Performance Fast clock = 1.271 * 2 *clock cycle time = 1.89 Slow clock 1.342 * clock cycle time