CSE 351: The Hardware/Software Interface

Slides:

Advertisements

Similar presentations

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Advertisements

COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.

Lecture 21 Last lecture Today’s lecture Cache Memory Virtual memory

CMPE 421 Parallel Computer Architecture

Lecture 19: Virtual Memory

July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

IT253: Computer Organization

Virtual Memory. DRAM as cache What about programs larger than DRAM? When we run multiple programs, all must fit in DRAM! Add another larger, slower level.

1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.

1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Carnegie Mellon Introduction to Computer Systems / Spring 2009 March 23, 2009 Virtual Memory.

Memory Hierarchies Sonish Shrestha October 3, 2013.

LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”

CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.

CSCI206 - Computer Organization & Programming

CMSC 611: Advanced Computer Architecture

Translation Lookaside Buffer

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Virtual Memory Chapter 7.4.

CSE 351 Section 9 3/1/12.

Memory COMPUTER ARCHITECTURE

Memory and cache CPU Memory I/O.

Virtual Memory: the Page Table and Page Swapping

The Goal: illusion of large, fast, cheap memory

Lecture 12 Virtual Memory.

How will execution time grow with SIZE?

The Hardware/Software Interface CSE351 Winter 2013

Morgan Kaufmann Publishers

CS510 Operating System Foundations

CS61C : Machine Structures Lecture 6. 2

Cache Memories September 30, 2008

Chapter 8 Digital Design and Computer Architecture: ARM® Edition

Andy Wang Operating Systems COP 4610 / CGS 5765

Lecture 23: Cache, Memory, Virtual Memory

FIGURE 12-1 Memory Hierarchy

Lecture 22: Cache Hierarchies, Memory

Computer Architecture

Lecture 22: Cache Hierarchies, Memory

Andy Wang Operating Systems COP 4610 / CGS 5765

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Translation Lookaside Buffer

Overheads for Computers as Components 2nd ed.

Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory

How can we find data in the cache?

Miss Rate versus Block Size

Lecture 20: OOO, Memory Hierarchy

Translation Buffers (TLB’s)

Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia

CS-447– Computer Architecture Lecture 20 Cache Memories

CSE451 Virtual Memory Paging Autumn 2002

Virtual Memory Prof. Eric Rotenberg

Translation Buffers (TLB’s)

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Some of the slides are adopted from David Patterson (UCB)

CS703 - Advanced Operating Systems

Cache Memory and Performance

Principle of Locality: Memory Hierarchies

Translation Buffers (TLBs)

Sarah Diesburg Operating Systems CS 3430

Andy Wang Operating Systems COP 4610 / CGS 5765

Review What are the advantages/disadvantages of pages versus segments?

10/18: Lecture Topics Using spatial locality

Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson

Sarah Diesburg Operating Systems COP 4610

Virtual Memory 1 1.

Presentation transcript:

CSE 351: The Hardware/Software Interface Section 7 Caches, lab 4

Caches Caches speed up accesses to memory through temporal and spatial locality Temporal locality in caches: recently-accessed data is more likely to be contained in the cache Spatial locality in caches: if a[i] is pulled into the cache, then a[i + j] for small j is likely to be pulled into the cache too This depends on the size of cache lines, though 1/3/2019

Temporal locality example Pretend that the following code is executed more-or-less as written (with result, b, and c in registers): int example(int* a, int b, int c) { int result = *a; result += b; result += c; result += *a; return result; } *a is likely to be in the cache already going into the second access, so there is no need for the CPU to access memory twice (due to a cache hit) 1/3/2019

Temporal locality example int example(int* a, int b, int c) { int result = *a; result += b; result += c; // (generate the Mandelbrot fractal // to some high recursive depth, e.g.) result += *a; return result; } If we perform some memory-intensive operation prior to the second access to *a, then *a is less likely to be cached when the CPU attempts to read it again (resulting in a cache miss) 1/3/2019

Spatial locality example int example(int* array, int len) { int sum = 0; for (int i = 0; i < len; ++i) { sum += array[i]; } return sum; Accessing memory causes neighboring memory to be cached as well If cache lines are 64-bits in size, for example, then accessing array[0] will pull array[1] into the cache too, so len / 2 memory accesses are required in total 1/3/2019

Types of caches There are a variety of different cache types, but the most commonly-used are direct-mapped caches, set-associative caches, and fully-associative caches Which type to use where depends on size, speed, hardware cost, and access pattern considerations 1/3/2019

Direct-mapped caches Direct-mapped caches are hash tables where the entries are cache lines (data blocks) of size B containing cached memory *Diagram originally from Tom Bergan 1/3/2019

Direct-mapped caches Addresses are broken up into [tag, index, offset] tag helps prevent against hash collisions index specifies which data block to access offset specifies the offset at which to read/write data The valid bit simply indicates whether data block contains data 1/3/2019

Direct-mapped cache example Let’s say we have an address of 8 bits in length (say 0xF6), where the tag is 2 bits, the index is 4 bits, and the offset is 2 bits 0xF6 = 0b11110110 = [tag, index, offset] = [0b11, 0b1101, 0b10] How big are data blocks? At most how many cache entries can be represented? How big are cache entries in total? To read from this address in a direct-mapped cache, look at the valid bit and tag at line index If the valid bit is set and tag matches what is stored there, return the data at offset (cache hit) Otherwise perform a memory access and store retrieved data in the cache (cache miss) This would be a good exercise to walk through on the whiteboard—it’s hard to fit everything on slides. Can just use index % num_entries for hash function 1/3/2019

Direct-mapped cache example To write to this address in a direct-mapped cache, set the valid bit, tag, and data at line index Subsequent reads that match this tag will now result in a cache hit What happens if an entry at that index with a different tag already exists? Overwrite the tag and data with the new values …but this can cause poor performance, since now attempting to access the data will result in a cache miss Also need to update data stored in memory: can either write-through (update on all memory writes) or write-back (update on cache overwrites due to either memory reads or writes) 1/3/2019

Set-associative caches Set-associative caches help to mitigate the situation where particular cache lines are frequently invalidated Which part of the address affects whether such invalidations happen? Addresses are taken to have the same [tag, index, offset] form when indexing into set-associative caches Each index maps to a set of N cache entries in an N-way associative cache 1/3/2019

Set-associative caches *Diagram from Tom Bergan 1/3/2019

Set-associative caches When performing a read from a set-associative cache, check every entry in the set under index If an entry has a matching tag and its valid bit is set, then return the data at the address’ offset If no entry has both a matching tag and valid bit, then perform a fetch from memory and add a new entry for this address/data If all cache entries in a set fill up, pick one of them to evict using a replacement policy 1/3/2019

Set-associative caches When performing a write to a set-associative cache, check every entry in the set under index If there is an existing entry, simply update it Otherwise add new entry and (optionally) write the data to memory as with direct-mapped cache 1/3/2019

Set-associative caches Given addresses of the form [tag, index, offset] with s bits for the index and b bits for the offset: There can be at most 2s addressable sets There are exactly 2b addressable bytes in the data blocks 1/3/2019

Fully-associative caches Instead of having multiple sets of cache entries, keep just one What are the implications of this in terms of hardware costs versus access times? Fully-associative caches are not very common, but the translation lookaside buffer (TLB), which facilitates virtual address to physical address translation, is one such example Expect more on the TLB in operating systems or (maybe?) hardware design and implementation 1/3/2019

Associativity Trade-offs Greater associativity Pro: results in fewer misses, so the CPU spends less time/power reading from slow memory Con: searching the cache takes longer (fully associative => search the entire cache) Less associativity Pro: searching the cache takes less time (direct- mapped requires reading only one entry) Con: results in more misses, because there are fewer spots for each address 1/3/2019

Associativity Trade-offs Direct-mapped cache Best when the miss penalty is minimal Fastest hit times, so the best tradeoff for “large” caches Fully-associative cache Lowest miss rate, so the best tradeoff when the miss penalty is maximal 1/3/2019