1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections 5.1-5.2)

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
The Lord of the Cache Project 3. Caches Three common cache designs: Direct-Mapped store in exactly one cache line Fully Associative store in any cache.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
1 Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1)
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
1 Lecture 13: Cache Innovations Today: cache access basics and innovations, DRAM (Sections )
EECS 470 Cache Systems Lecture 13 Coverage: Chapter 5.
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
2/27/2002CSE Cache II Caches, part II CPU On-chip cache Off-chip cache DRAM memory Disk memory.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
1 Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5)
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
CS2100 Computer Organisation Cache II (AY2015/6) Semester 1.
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
Improving Cache Performance Four categories of optimisation: –Reduce miss rate –Reduce miss penalty –Reduce miss rate or miss penalty using parallelism.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
CS6290 Caches. Locality and Caches Data Locality –Temporal: if data item needed now, it is likely to be needed again in near future –Spatial: if data.
Lecture 20 Last lecture: Today’s lecture: Types of memory
1 Lecture: Virtual Memory Topics: virtual memory, TLB/cache access (Sections 2.2)
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Memory Design Principles Principle of locality dominates design Smaller = faster Hierarchy goal: total memory system almost as cheap as the cheapest component,
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Chapter 5 Memory II CSE 820. Michigan State University Computer Science and Engineering Equations CPU execution time = (CPU cycles + Memory-stall cycles)
Lecture: Large Caches, Virtual Memory
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
Lecture: Cache Hierarchies
Lecture: Cache Hierarchies
Lecture 23: Cache, Memory, Security
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
Lecture 12: Cache Innovations
Lecture 23: Cache, Memory, Virtual Memory
Chapter 5 Memory CSE 820.
Lecture 22: Cache Hierarchies, Memory
Lecture: Cache Innovations, Virtual Memory
Lecture 22: Cache Hierarchies, Memory
Lecture 20: OOO, Memory Hierarchy
Lecture 15: Memory Design
Lecture 22: Cache Hierarchies, Memory
Lecture 11: Cache Hierarchies
Lecture: Cache Hierarchies
Lecture 21: Memory Hierarchy
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
Lecture 14: Cache Performance
Presentation transcript:

1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )

2 Accessing the Cache 8-byte words Direct-mapped cache: each address maps to a unique address 8 words: 3 index bits Byte address Data array Sets Offset

3 The Tag Array 8-byte words Direct-mapped cache: each address maps to a unique address Byte address Tag Compare Data arrayTag array

4 Increasing Line Size 32-byte cache line size or block size Byte address Tag Data arrayTag array Offset A large cache line size  smaller tag array, fewer misses because of spatial locality

5 Associativity Byte address Tag Data arrayTag array Set associativity  fewer conflicts; wasted power because multiple data and tags are read Way-1Way-2 Compare

6 Example 32 KB 4-way set-associative data cache array with 32 byte line sizes How many sets? How many index bits, offset bits, tag bits? How large is the tag array?

7 Cache Misses On a write miss, you may either choose to bring the block into the cache (write-allocate) or not (write-no-allocate) On a read miss, you always bring the block in (spatial and temporal locality) – but which block do you replace?  no choice for a direct-mapped cache  randomly pick one of the ways to replace  replace the way that was least-recently used (LRU)  FIFO replacement (round-robin)

8 Writes When you write into a block, do you also update the copy in L2?  write-through: every write to L1  write to L2  write-back: mark the block as dirty, when the block gets replaced from L1, write it to L2 Writeback coalesces multiple writes to an L1 block into one L2 write Writethrough simplifies coherency protocols in a multiprocessor system as the L2 always has a current copy of data

9 Reducing Cache Miss Penalty Multi-level caches Critical word first Priority for reads Victim caches

10 Multi-Level Caches The L2 and L3 have properties that are different from L1  access time is not as critical for L2 as it is for L1 (every load/store/instruction accesses the L1)  the L2 is much larger and can consume more power per access Hence, they can adopt alternative design choices  serial tag and data access  high associativity

11 Read/Write Priority For writeback/thru caches, writes to lower levels are placed in write buffers When we have a read miss, we must look up the write buffer before checking the lower level When we have a write miss, the write can merge with another entry in the write buffer or it creates a new entry Reads are more urgent than writes (probability of an instr waiting for the result of a read is 100%, while probability of an instr waiting for the result of a write is much smaller) – hence, reads get priority unless the write buffer is full

12 Victim Caches A direct-mapped cache suffers from misses because multiple pieces of data map to the same location The processor often tries to access data that it recently discarded – all discards are placed in a small victim cache (4 or 8 entries) – the victim cache is checked before going to L2 Can be viewed as additional associativity for a few sets that tend to have the most conflicts

13 Types of Cache Misses Compulsory misses: happens the first time a memory word is accessed – the misses for an infinite cache Capacity misses: happens because the program touched many other words before re-touching the same word – the misses for a fully-associative cache Conflict misses: happens because two words map to the same location in the cache – the misses generated while moving from a fully-associative to a direct-mapped cache Sidenote: can a fully-associative cache have more misses than a direct-mapped cache of the same size?

14 What Influences Cache Misses? CompulsoryCapacityConflict Increasing cache capacity Increasing number of sets Increasing block size Increasing associativity

15 Reducing Miss Rate Large block size – reduces compulsory misses, reduces miss penalty in case of spatial locality – increases traffic between different levels, space wastage, and conflict misses Large caches – reduces capacity/conflict misses – access time penalty High associativity – reduces conflict misses – rule of thumb: 2-way cache of capacity N/2 has the same miss rate as 1-way cache of capacity N – access time penalty Way prediction – by predicting the way, the access time is effectively like a direct-mapped cache – can also reduce power consumption

16 Tolerating Miss Penalty Out of order execution: can do other useful work while waiting for the miss – can have multiple cache misses -- cache controller has to keep track of multiple outstanding misses (non-blocking cache) Hardware and software prefetching into prefetch buffers – aggressive prefetching can increase contention for buses

17 Title Bullet