Andy Wang Operating Systems COP 4610 / CGS 5765

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Virtual Memory 2 P & H Chapter
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
Principle Behind Hierarchical Storage  Each level memorizes values stored at lower levels  Instead of paying the full latency for the “furthermost” level.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
CMPE 421 Parallel Computer Architecture
Lecture 19: Virtual Memory
Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
IT253: Computer Organization
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
The Three C’s of Misses 7.5 Compulsory Misses The first time a memory location is accessed, it is always a miss Also known as cold-start misses Only way.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
CHAPTER 3-3: PAGE MAPPING MEMORY MANAGEMENT. VIRTUAL MEMORY Key Idea Disassociate addresses referenced in a running process from addresses available in.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
CS203 – Advanced Computer Architecture Virtual Memory.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Chapter 19 Translation Lookaside Buffer
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Memory Management Virtual Memory.
Virtual Memory Chapter 7.4.
CSE 351 Section 9 3/1/12.
ECE232: Hardware Organization and Design
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
From Address Translation to Demand Paging
From Address Translation to Demand Paging
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Memory Hierarchy Virtual Memory, Address Translation
Cache Memory Presentation I
Morgan Kaufmann Publishers
CSCI206 - Computer Organization & Programming
Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
EECE.4810/EECE.5730 Operating Systems
Andy Wang Operating Systems COP 4610 / CGS 5765
Memory and cache CPU Memory I/O.
Lecture 23: Cache, Memory, Virtual Memory
Translation Lookaside Buffer
Overheads for Computers as Components 2nd ed.
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Translation Buffers (TLB’s)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2004 Page Tables, TLBs, and Other Pragmatics Hank Levy 1.
Translation Buffers (TLB’s)
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CS703 - Advanced Operating Systems
CSE 451: Operating Systems Winter 2005 Page Tables, TLBs, and Other Pragmatics Steve Gribble 1.
Lecture 9: Caching and Demand-Paged Virtual Memory
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Translation Buffers (TLBs)
Virtual Memory.
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

Andy Wang Operating Systems COP 4610 / CGS 5765 Caching and TLBs Andy Wang Operating Systems COP 4610 / CGS 5765

Caching Store copies of data at places that can be accessed more quickly than accessing the original Speed up access to frequently used data At a cost: Slows down the infrequently used data

Caching in Memory Hierarchy Provides the illusion of GB storage With register access time Access Time Size Cost Primary memory Registers 1 clock cycle ~500 bytes On chip Cache 1-2 clock cycles <10 MB Main memory 1-4 clock cycles < 4GB $0.1/MB Secondary memory Disk 5-50 msec < 200 GB $0.001/MB

Caching in Memory Hierarchy Exploits two hardware characteristics Smaller memory provides faster access times Large memory provides cheaper storage per byte Puts frequently accessed data in small, fast, and expensive memory Assumption: non-random program access behaviors

Locality in Access Patterns Temporal locality: recently referenced locations are more likely to be referenced in the near future e.g., files Spatial locality: referenced locations tend to be clustered e.g., listing all files under a directory

Caching Storing a small set of data in cache Provides the following illusions Large storage Speed of small cache Does not work well for programs with little localities e.g., scanning the entire disk Leaves behind cache content with no localities (cache pollution)

Generic Issues in Caching Effective metrics Cache hit: a lookup is resolved by the content stored in cache Cache miss: a lookup cannot be resolved by the content stored in cache Effective access time: P(hit)*(hit_cost) + P(miss)*(miss_cost)

Effective Access Time Cache hit rate: 99% Cache miss rate: 1% Cost: 2 clock cycles Cache miss rate: 1% Cost: 4 clock cycles Effective access time: 99%*2 + 1%*(2 + 4) = 1.98 + 0.06 = 2.04 (clock cycles)

Implications 10 MB of cache Illusion of 4 GB of memory Running at the speed of hardware cache

Reasons for Cache Misses Compulsory misses: data brought into the cache for the first time e.g., booting Capacity misses: caused by the limited size of a cache A program may require a hash table that exceeds the cache capacity Random access pattern No caching policy can be effective

Reasons for Cache Misses Misses due to competing cache entries: a cache entry assigned to two pieces of data When both active Each will preempt the other Policy misses: caused by cache replacement policy, which chooses which cache entry to replace when the cache is full

Design Issues of Caching How is a cache entry lookup performed? Which cache entry should be replaced when the cache is full? How to maintain consistency between the cache copy and the real data?

Caching Applied to Address Translation Process references the same page repeatedly Translating each virtual address to physical address is wasteful Translation lookaside buffer (TLB) Track frequently used translations Avoid translations in the common case

Caching Applied to Address Translation In TLB Virtual addresses TLB Physical addresses Translation table Data reads or writes (untranslated)

Example of the TLB Content Virtual page number (VPN) Physical page number (PPN) Control bits 2 1 Valid, rw - Invalid 4

TLB Lookups Sequential search of the TLB table Direct mapping: assigns each virtual page to a specific slot in the TLB e.g., use upper bits of VPN to index TLB

Direct Mapping if (TLB[UpperBits(vpn)].vpn == vpn) { return TLB[UpperBits(vpn)].ppn; } else { ppn = PageTable[vpn]; TLB[UpperBits(vpn)].control = INVALID; TLB[UpperBits(vpn)].vpn = vpn; TLB[UpperBits(vpn)].ppn = ppn; TLB[UpperBits(vpn)].control = VALID | RW return ppn; }

Direct Mapping When use only high order bits Two pages may compete for the same TLB entry May toss out needed TLB entries When use only low order bits TLB reference will be clustered Failing to use full range of TLB entries Common approach: combine both

TLB Lookups Sequential search of the TLB table Direct mapping: assigns each virtual page to a specific slot in the TLB e.g., use upper bits of VPN to index TLB Set associativity: use N TLB banks to perform lookups in parallel

Two-Way Associative Cache hash VPN = VPN PPN VPN PPN

Two-Way Associative Cache hash VPN = VPN PPN VPN PPN

Two-Way Associative Cache hash VPN = VPN PPN VPN PPN If miss, translate and replace one of the entries

TLB Lookups Direct mapping: assigns each virtual page to a specific slot in the TLB e.g., use upper bits of VPN to index TLB Set associativity: use N TLB banks to perform lookups in parallel Fully associative cache: allows looking up all TLB entries in parallel

Fully Associative Cache hash VPN = VPN PPN VPN PPN VPN PPN

Fully Associative Cache hash VPN = VPN PPN VPN PPN VPN PPN

Fully Associative Cache hash VPN = VPN PPN VPN PPN VPN PPN If miss, translate and replace one of the entries

TLB Lookups Typically TLBs are small and fully associative Hardware caches use direct mapped or set-associative cache

Replacement of TLB Entries Direct mapping Entry replaced whenever a VPN mismatches Associative caches Random replacement LRU (least recently used) MRU (most recently used) Depending on reference patterns

Replacement of TLB Entries Hardware-level TLB replacement is mostly random Simple and fast Software-level Memory page replacements are more sophisticated CPU cycles vs. cache hit rate

Consistency Between TLB and Page Tables Different processes have different page tables TLB entries need to be invalidated on context switches Alternatives: Tag TLB entries with process IDs Additional cost of hardware and comparisons per lookup

Relationship Between TLB and HW Memory Caches We can extend the principle of TLB Virtually addressed cache: between the CPU and the translation tables Physically addressed cache: between the translation tables and the main memory

Relationship Between TLB and HW Memory Caches VA data Virtually addressed cache PA data Physically addressed cache TLB Translation tables Data reads or writes (untranslated)

Two Ways to Commit Data Changes Write-through: immediately propagates update through various levels of caching For critical data Write-back: delays the propagation until the cached item is replaced Goal: spread the cost of update propagation over multiple updates Less costly