Chapter 5 Memory II CSE 820. Michigan State University Computer Science and Engineering Equations CPU execution time = (CPU cycles + Memory-stall cycles)

Slides:



Advertisements
Similar presentations
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Advertisements

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Memory Hierarchy Design Chapter 5 Karin Strauss. Background 1980: no caches 1995: two levels of caches 2004: even three levels of caches Why? Processor-Memory.
Reducing Cache Misses (Sec. 5.3) Three categories of cache misses: 1.Compulsory –The very first access to a block cannot be in the cache 2.Capacity –Due.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
1  2004 Morgan Kaufmann Publishers Chapter Seven.
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
EENG449b/Savvides Lec /7/05 April 7, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
Storage HierarchyCS510 Computer ArchitectureLecture Lecture 12 Storage Hierarchy.
CSC 4250 Computer Architectures December 5, 2006 Chapter 5. Memory Hierarchy.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
1 Caching Basics CS Memory Hierarchies Takes advantage of locality of reference principle –Most programs do not access all code and data uniformly,
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani.
CS6290 Caches. Locality and Caches Data Locality –Temporal: if data item needed now, it is likely to be needed again in near future –Spatial: if data.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
For each of these, where could the data be and how would we find it? TLB hit – cache or physical memory TLB miss – cache, memory, or disk Virtual memory.
Memory Design Principles Principle of locality dominates design Smaller = faster Hierarchy goal: total memory system almost as cheap as the cheapest component,
Microprocessor Microarchitecture Memory Hierarchy Optimization Lynn Choi Dept. Of Computer and Electronics Engineering.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 29 Memory Hierarchy Design Cache Performance Enhancement by: Reducing Cache.
Chapter 5 Memory Hierarchy Design. 2 Many Levels in Memory Hierarchy Pipeline registers Register file 1st-level cache (on-chip) 2nd-level cache (on same.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CMSC 611: Advanced Computer Architecture
Soner Onder Michigan Technological University
Associativity in Caches Lecture 25
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
CS 704 Advanced Computer Architecture
Lecture: Cache Hierarchies
The University of Adelaide, School of Computer Science
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
Lecture: Cache Hierarchies
Lecture 14: Reducing Cache Misses
Chapter 5 Memory CSE 820.
Lecture 08: Memory Hierarchy Cache Performance
Lecture: Cache Innovations, Virtual Memory
CPE 631 Lecture 05: Cache Design
Performance metrics for caches
Adapted from slides by Sally McKee Cornell University
Performance metrics for caches
Lecture 11: Cache Hierarchies
Summary 3 Cs: Compulsory, Capacity, Conflict Misses Reducing Miss Rate
CSC3050 – Computer Architecture
Caches: AAT, 3C’s model of misses Prof. Eric Rotenberg
Performance metrics for caches
Cache - Optimization.
Cache Memory Rabi Mahapatra
Principle of Locality: Memory Hierarchies
The University of Adelaide, School of Computer Science
Performance metrics for caches
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Chapter 5 Memory II CSE 820

Michigan State University Computer Science and Engineering Equations CPU execution time = (CPU cycles + Memory-stall cycles) x clockCycleTime Memory stall cycles = misses x penalty = IC x miss/Inst x penalty = IC x memAccess/Inst x missRate x penalty

Michigan State University Computer Science and Engineering Hierarchy Questions 1.Block Placement Where can a block be placed? 2.Block Identification How is a block found? 3.Block Replacement Which block should be replaced on a miss? 4.Write Strategy What happens on a write?

Michigan State University Computer Science and Engineering Cache Q1: Where to place?

Michigan State University Computer Science and Engineering Cache Q2: How to find it? Index selects the set Tag checks all in the set Offset selects within the block Valid bit block address block offset indextag

Michigan State University Computer Science and Engineering Cache Q3: Replacement? Random –Simplest FIFO LRU –Approximation –Outperforms others for large caches

Michigan State University Computer Science and Engineering Cache Q4: write? Reads are most common and easiest Writes –Write through –Write back On replacement Dirty bit Allocate? –Write allocate –No-write allocate

Michigan State University Computer Science and Engineering Alpha Cache: Go through it on your own

Michigan State University Computer Science and Engineering Equations Fig 5.9 has 12 memory performance equations Most are variations on CPU = IC x CPI x cycles AvgMemAccess = Hit + MissRate x MissPenalty

Michigan State University Computer Science and Engineering AvgMemAccess = Hit + MissRate x MissPenalty 17 Cache Optimizations Reduce miss penalty Multilevel, critical first, read-before-write, merging writes, victim cache Reduce miss rate Larger blocks and caches, higher associativity, way prediction, compiler optimizations Reduce miss rate & penalty with parallelism Non-blocking, hardware and software prefetch Reduce hit time Size, translation, pipelined, trace

Michigan State University Computer Science and Engineering Reducing Cache Miss Penalty

Michigan State University Computer Science and Engineering Questions Why is a small cache fast? Why will a designer put L1 and L2 cache on a processor chip rather than simply having a large L1 cache? Why is L1 direct mapped?

Michigan State University Computer Science and Engineering Reducing Miss Penalty: Multilevel Caches How does an L2 cache reduce the miss penalty?

Michigan State University Computer Science and Engineering Multilevel Cache AvgMemAccessTime = HitTime L1 + MissRate L1 X MissPenalty L1 where MissPenalty L1 = HitTime L2 + MissRate L2 X MissPenalty L2

Michigan State University Computer Science and Engineering Terms Global miss rate = misses/CPUaccesses –e.g. for L2 = MissRate L1 X MissRate L2 Local miss rate = misses/CacheAccesses –e.g. MissRate L2 measured on the leftovers from L1

Michigan State University Computer Science and Engineering L1 vs L2 with 32K L1: small L2 is poor What is good about this graph? Hit time is good and so is avg memory latency.

Michigan State University Computer Science and Engineering Inclusive vs Exclusive Inclusion: L1 is subset of L2 pro: consistency between I/O and cache (or among multiprocessor caches) is easy—simply check L2 e.g. P4 Exclusive: L1 and L2 are disjoint pro: useful when L2 is only slightly larger than L1 e.g. AMD Athlon

Michigan State University Computer Science and Engineering Miss Penalty Reduction: Critical Word First & Early Restart Critical word first: In multiword block, bring in the desired word first. Early Restart: restart CPU when desired word is received

Michigan State University Computer Science and Engineering Miss Penalty Reduction: Priority to Read Reads first if no conflict with writes in write buffer Why?

Michigan State University Computer Science and Engineering Miss Penalty Reduction: Merging Write Buffer Within write buffer merge words into blocks before writing

Michigan State University Computer Science and Engineering Miss Penalty Reduction: Victim Cache Remember last block discarded Algorithm: on miss, if block is in victim cache, swap. Performance: a 4-entry victim cache can resolve ¼ of misses e.g. AMD Athlon (8 entries)

Michigan State University Computer Science and Engineering Victim Cache

Michigan State University Computer Science and Engineering Reducing Cache Miss Rate

Michigan State University Computer Science and Engineering Miss Classification Compulsory: first access cannot be in cache Capacity: if cache cannot contain all blocks, some accesses must miss Conflict: others e.g. two blocks conflict by mapping to same slot (fully associative has no conflict)

Michigan State University Computer Science and Engineering compulsory  Capacity = fully associative (decreases with size) Conflict decreases with set size.

Michigan State University Computer Science and Engineering Miss category as a percentage.