CSE 351: The Hardware/Software Interface

Slides:



Advertisements
Similar presentations
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Advertisements

COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
Lecture 21 Last lecture Today’s lecture Cache Memory Virtual memory
CMPE 421 Parallel Computer Architecture
Lecture 19: Virtual Memory
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
IT253: Computer Organization
Virtual Memory. DRAM as cache What about programs larger than DRAM? When we run multiple programs, all must fit in DRAM! Add another larger, slower level.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Carnegie Mellon Introduction to Computer Systems / Spring 2009 March 23, 2009 Virtual Memory.
Memory Hierarchies Sonish Shrestha October 3, 2013.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.
CSCI206 - Computer Organization & Programming
CMSC 611: Advanced Computer Architecture
Translation Lookaside Buffer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Virtual Memory Chapter 7.4.
CSE 351 Section 9 3/1/12.
Memory COMPUTER ARCHITECTURE
Memory and cache CPU Memory I/O.
Virtual Memory: the Page Table and Page Swapping
The Goal: illusion of large, fast, cheap memory
Lecture 12 Virtual Memory.
How will execution time grow with SIZE?
The Hardware/Software Interface CSE351 Winter 2013
Morgan Kaufmann Publishers
CS510 Operating System Foundations
CS61C : Machine Structures Lecture 6. 2
Cache Memories September 30, 2008
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Andy Wang Operating Systems COP 4610 / CGS 5765
Lecture 23: Cache, Memory, Virtual Memory
FIGURE 12-1 Memory Hierarchy
Lecture 22: Cache Hierarchies, Memory
Computer Architecture
Lecture 22: Cache Hierarchies, Memory
Andy Wang Operating Systems COP 4610 / CGS 5765
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Translation Lookaside Buffer
Overheads for Computers as Components 2nd ed.
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
How can we find data in the cache?
Miss Rate versus Block Size
Lecture 20: OOO, Memory Hierarchy
Translation Buffers (TLB’s)
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
CS-447– Computer Architecture Lecture 20 Cache Memories
CSE451 Virtual Memory Paging Autumn 2002
Virtual Memory Prof. Eric Rotenberg
Translation Buffers (TLB’s)
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Some of the slides are adopted from David Patterson (UCB)
CS703 - Advanced Operating Systems
Cache Memory and Performance
Principle of Locality: Memory Hierarchies
Translation Buffers (TLBs)
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
Review What are the advantages/disadvantages of pages versus segments?
10/18: Lecture Topics Using spatial locality
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Sarah Diesburg Operating Systems COP 4610
Virtual Memory 1 1.
Presentation transcript:

CSE 351: The Hardware/Software Interface Section 7 Caches, lab 4

Caches Caches speed up accesses to memory through temporal and spatial locality Temporal locality in caches: recently-accessed data is more likely to be contained in the cache Spatial locality in caches: if a[i] is pulled into the cache, then a[i + j] for small j is likely to be pulled into the cache too This depends on the size of cache lines, though 1/3/2019

Temporal locality example Pretend that the following code is executed more-or-less as written (with result, b, and c in registers): int example(int* a, int b, int c) { int result = *a; result += b; result += c; result += *a; return result; } *a is likely to be in the cache already going into the second access, so there is no need for the CPU to access memory twice (due to a cache hit) 1/3/2019

Temporal locality example int example(int* a, int b, int c) { int result = *a; result += b; result += c; // (generate the Mandelbrot fractal // to some high recursive depth, e.g.) result += *a; return result; } If we perform some memory-intensive operation prior to the second access to *a, then *a is less likely to be cached when the CPU attempts to read it again (resulting in a cache miss) 1/3/2019

Spatial locality example int example(int* array, int len) { int sum = 0; for (int i = 0; i < len; ++i) { sum += array[i]; } return sum; Accessing memory causes neighboring memory to be cached as well If cache lines are 64-bits in size, for example, then accessing array[0] will pull array[1] into the cache too, so len / 2 memory accesses are required in total 1/3/2019

Types of caches There are a variety of different cache types, but the most commonly-used are direct-mapped caches, set-associative caches, and fully-associative caches Which type to use where depends on size, speed, hardware cost, and access pattern considerations 1/3/2019

Direct-mapped caches Direct-mapped caches are hash tables where the entries are cache lines (data blocks) of size B containing cached memory *Diagram originally from Tom Bergan 1/3/2019

Direct-mapped caches Addresses are broken up into [tag, index, offset] tag helps prevent against hash collisions index specifies which data block to access offset specifies the offset at which to read/write data The valid bit simply indicates whether data block contains data 1/3/2019

Direct-mapped cache example Let’s say we have an address of 8 bits in length (say 0xF6), where the tag is 2 bits, the index is 4 bits, and the offset is 2 bits 0xF6 = 0b11110110 = [tag, index, offset] = [0b11, 0b1101, 0b10] How big are data blocks? At most how many cache entries can be represented? How big are cache entries in total? To read from this address in a direct-mapped cache, look at the valid bit and tag at line index If the valid bit is set and tag matches what is stored there, return the data at offset (cache hit) Otherwise perform a memory access and store retrieved data in the cache (cache miss) This would be a good exercise to walk through on the whiteboard—it’s hard to fit everything on slides. Can just use index % num_entries for hash function 1/3/2019

Direct-mapped cache example To write to this address in a direct-mapped cache, set the valid bit, tag, and data at line index Subsequent reads that match this tag will now result in a cache hit What happens if an entry at that index with a different tag already exists? Overwrite the tag and data with the new values …but this can cause poor performance, since now attempting to access the data will result in a cache miss Also need to update data stored in memory: can either write-through (update on all memory writes) or write-back (update on cache overwrites due to either memory reads or writes) 1/3/2019

Set-associative caches Set-associative caches help to mitigate the situation where particular cache lines are frequently invalidated Which part of the address affects whether such invalidations happen? Addresses are taken to have the same [tag, index, offset] form when indexing into set-associative caches Each index maps to a set of N cache entries in an N-way associative cache 1/3/2019

Set-associative caches *Diagram from Tom Bergan 1/3/2019

Set-associative caches When performing a read from a set-associative cache, check every entry in the set under index If an entry has a matching tag and its valid bit is set, then return the data at the address’ offset If no entry has both a matching tag and valid bit, then perform a fetch from memory and add a new entry for this address/data If all cache entries in a set fill up, pick one of them to evict using a replacement policy 1/3/2019

Set-associative caches When performing a write to a set-associative cache, check every entry in the set under index If there is an existing entry, simply update it Otherwise add new entry and (optionally) write the data to memory as with direct-mapped cache 1/3/2019

Set-associative caches Given addresses of the form [tag, index, offset] with s bits for the index and b bits for the offset: There can be at most 2s addressable sets There are exactly 2b addressable bytes in the data blocks 1/3/2019

Fully-associative caches Instead of having multiple sets of cache entries, keep just one What are the implications of this in terms of hardware costs versus access times? Fully-associative caches are not very common, but the translation lookaside buffer (TLB), which facilitates virtual address to physical address translation, is one such example Expect more on the TLB in operating systems or (maybe?) hardware design and implementation 1/3/2019

Associativity Trade-offs Greater associativity Pro: results in fewer misses, so the CPU spends less time/power reading from slow memory Con: searching the cache takes longer (fully associative => search the entire cache) Less associativity Pro: searching the cache takes less time (direct- mapped requires reading only one entry) Con: results in more misses, because there are fewer spots for each address 1/3/2019

Associativity Trade-offs Direct-mapped cache Best when the miss penalty is minimal Fastest hit times, so the best tradeoff for “large” caches Fully-associative cache Lowest miss rate, so the best tradeoff when the miss penalty is maximal 1/3/2019