Cache memory Replacement Policy, Virtual Memory Prof. Sin-Min Lee Department of Computer Science.

Slides:



Advertisements
Similar presentations
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Advertisements

Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
Chapter 101 Virtual Memory Chapter 10 Sections and plus (Skip:10.3.2, 10.7, rest of 10.8)
Virtual Memory Chapter 8.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
Virtual Memory Chapter 8.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
1 Virtual Memory Chapter 8. 2 Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Memory Organization.
1 Lecture 9: Virtual Memory Operating System I Spring 2007.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
Overview: Memory Memory Organization: General Issues (Hardware) –Objectives in Memory Design –Memory Types –Memory Hierarchies Memory Management (Software.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Virtual Memory.
Memory Systems Architecture and Hierarchical Memory Systems
Chapter 6: Memory Memory is organized into a hierarchy
IT253: Computer Organization
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
1 Virtual Memory Chapter 8. 2 Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
CS 149: Operating Systems March 3 Class Meeting Department of Computer Science San Jose State University Spring 2015 Instructor: Ron Mak
Virtual Memory Prof. Sin-Min Lee Department of Computer Science.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
Chapter 9 Memory Organization By Nguyen Chau Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
Cache Memory By Ed Martinez.  The fastest and most expensive memory on a computer system that is used to store collections of data.  Uses very short.
1 Lecture 8: Virtual Memory Operating System Fall 2006.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Virtual Memory Prof. Sin-Min Lee Department of Computer Science.
Cache memory Replacement Policy Prof. Sin-Min Lee Department of Computer Science.
CS203 – Advanced Computer Architecture Virtual Memory.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
Introduction to computer architecture April 7th. Access to main memory –E.g. 1: individual memory accesses for j=0, j++, j
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
Virtual Memory Chapter 8.
CS161 – Design and Architecture of Computer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
CE 454 Computer Architecture
Chapter 2 Memory and process management
The Goal: illusion of large, fast, cheap memory
ITEC 202 Operating Systems
Cache Memory Presentation I
Lecture 10: Virtual Memory
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Contents Memory types & memory hierarchy Virtual memory (VM)
Operating Systems Concepts
Presentation transcript:

Cache memory Replacement Policy, Virtual Memory Prof. Sin-Min Lee Department of Computer Science

Q 2x4 Decoder X Y Z JKJK Q CLK XYZQJKQ+Q XYZQJKQ+Q JKQ+Q+ 00Q Q ,001, 010,011, 110, ,001, 010,100, 101, , ,111

There are three methods in block placement: Direct mapped : if each block has only one place it can appear in the cache, the cache is said to be direct mapped. The mapping is usually (Block address) MOD (Number of blocks in cache) Fully Associative : if a block can be placed anywhere in the cache, the cache is said to be fully associative. Set associative : if a block can be placed in a restricted set of places in the cache, the cache is said to be set associative. A set is a group of blocks in the cache. A block is first mapped onto a set, and then the block can be placed anywhere within that set. The set is usually chosen by bit selection; that is, (Block address) MOD (Number of sets in cache)

A pictorial example for a cache with only 4 blocks and a memory with only 16 blocks.

Direct mapped cache: A block from main memory can go in exactly one place in the cache. This is called direct mapped because there is direct mapping from any block address in memory to a single location in the cache. cache Main memory

Fully associative cache : A block from main memory can be placed in any location in the cache. This is called fully associative because a block in main memory may be associated with any entry in the cache. cache Main memory

Memory/Cache Related Terms Set associative cache : The middle range of designs between direct mapped cache and fully associative cache is called set-associative cache. In a n-way set- associative cache a block from main memory can go into n (n at least 2) locations in the cache. 2-way set-associative cache Main memory

Replacing Data Initially all valid bits are set to 0 As instructions and data are fetched from memory, the cache is filling and some data need to be replaced. Which ones? Direct mapping – obvious

Replacement Policies for Associative Cache 1.FIFO - fills from top to bottom and goes back to top. (May store data in physical memory before replacing it) 2.LRU – replaces the least recently used data. Requires a counter. 3.Random

Replacement in Set-Associative Cache Which if n ways within the location to replace? FIFO Random LRU Accessed locations are D, E, A

Writing Data If the location is in the cache, the cached value and possibly the value in physical memory must be updated. If the location is not in the cache, it maybe loaded into the cache or not (write-allocate and write-noallocate) Two methodologies: 1.Write-through Physical memory always contains the correct value 2.Write-back The value is written to physical memory only it is removed from the cache

Cache Performance Cache hits and cache misses. Hit ratio is the percentage of memory accesses that are served from the cache Average memory access time T M = h T C + (1- h)T P Tc = 10 ns Tp = 60 ns

Associative Cache Access order A0 B0 C2 A0 D1 B0 E4 F5 A0 C2 D1 V0 G3 C2 H7 I6 A0 B0 Tc = 10 ns Tp = 60 ns FIFO h = T M = ns

Direct-Mapped Cache Access order A0 B0 C2 A0 D1 B0 E4 F5 A0 C2 D1 V0 G3 C2 H7 I6 A0 B0 Tc = 10 ns Tp = 60 ns h = T M = ns

2-Way Set Associative Cache Access order A0 B0 C2 A0 D1 B0 E4 F5 A0 C2 D1 V0 G3 C2 H7 I6 A0 B0 Tc = 10 ns Tp = 60 ns LRU h = T M = ns

Associative Cache (FIFO Replacement Policy) DataABCADBEFACDBGCHIAB CACHECACHE AAAAAAAAAAAAAAAIII BBBBBBBBBBBBBBBAA CCCCCCCCCCCCCCCB DDDDDDDDDDDDDD EEEEEEEEEEEE FFFFFFFFFFF GGGGGG HHHH Hit? * * **** * Hit ratio = 7/18 A 0 B 0 C 2 A 0 D 1 B 0 E 4 F 5 A 0 C 2 D 1 B 0 G 3 C 2 H 7 I 6 A 0 B 0

Two-way set associative cache (LRU Replacement Policy) Hit ratio = 7/18 A 0 B 0 C 2 A 0 D 1 B 0 E 4 F 5 A 0 C 2 D 1 B 0 G 3 C 2 H 7 I 6 A 0 B 0 DataABCADBEFACDBGCHIAB CACHECACHE 0A-0A-1 A-0 A-1E-0 E-1 B-0 B-1B-0 0 B-1 B-0B-1 A-0 A-1 A-0A-1 1 D-0 D-1 D-0 1 F-0 F-1 2 C-0 C-1 2 I-0 3 G-0 G-1 3 H-0 Hit? * * ** * **

Associative Cache with 2 byte line size (FIFO Replacement Policy) Hit ratio = 11/18 A 0 B 0 C 2 A 0 D 1 B 0 E 4 F 5 A 0 C 2 D 1 B 0 G 3 C 2 H 7 I 6 A 0 B 0 A and J; B and D; C and G; E and F; and I and H DataABCADBEFACDBGCHIAB CACHECACHE AAAAAAAAAAAAAAIIII JJJJJJJJJJJJJJHHHH BBBBBBBBBBBBBBBAA DDDDDDDDDDDDDDDJJ CCCCCCCCCCCCCCCB GGGGGGGGGGGGGGGD EEEEEEEEEEEE FFFFFFFFFFFF Hit? *** ******* *

Direct-mapped Cache with line size of 2 bytes Hit ratio 7/18 DataABCADBEFACDBGCHIAB CACHECACHE 0ABBABBBBAABBBBBBAB 1JDDJDDDDJJDDDDDDJD 2 CCCCCCCCCCCCCCCC 3 GGGGGGGGGGGGGGGG 4 EEEEEEEEEEEE 5 FFFFFFFFFFFF 6 IIII 7 HHHH Hit? * * * *** * A 0 B 0 C 2 A 0 D 1 B 0 E 4 F 5 A 0 C 2 D 1 B 0 G 3 C 2 H 7 I 6 A 0 B 0 A and J; B and D; C and G; E and F; and I and H

Two-way set Associative Cache with line size of 2 bytes Hit ratio = 12/18 Data ABCADBEFACDBGCHIAB CACHECACHE 0A-0A-1 A-0A-1 E-0 E-1B-0 B-1B-0 1J-0J-1 J-0J-1 F-0 F-1D-0 D-1D-0 0 B-0 B-1B-0 B-1 A-0 A-1 A-0A-1 1 D-0 D-1D-0 D-1 J-0 J-1 J-0J-1 2 C-0 C-1 3 G-0 G-1 2 I-0 3 H-0 Hit? *** * * **** *** A 0 B 0 C 2 A 0 D 1 B 0 E 4 F 5 A 0 C 2 D 1 B 0 G 3 C 2 H 7 I 6 A 0 B 0 A and J; B and D; C and G; E and F; and I and H

Page Replacement - FIFO FIFO is simple to implement –When page in, place page id on end of list –Evict page at head of list Might be good? Page to be evicted has been in memory the longest time But? –Maybe it is being used –We just don’t know FIFO suffers from Belady’s Anomaly – fault rate may increase when there is more physical memory!

Parkinson's law : "Programs expand to fill the memory available to hold them" Idea : Manage the storage available efficiently between the available programs.

Before VM… Programmers tried to shrink programs to fit tiny memories Result: –Small –Inefficient Algorithms

Solution to Memory Constraints Use a secondary memory such as disk Divide disk into pieces that fit memory (RAM) –Called Virtual Memory

Implementations of VM Paging –Disk broken up into regular sized pages Segmentation –Disk broken up into variable sized segments

Memory Issues Idea: Separate concepts of –address space Disk –memory locations RAM Example: –Address Field = 2 16 = memory cells –Memory Size = 4096 memory cells How can we fit the Address Space into Main Memory?

Paging Break memories into Pages NOTE: normally Main Memory has thousands of pages page 1 page = 4096 bytes New Issue: How to manage addressing?

Address Mapping Mapping 2ndary Memory addresses to Main Memory addresses page 1 page = 4096 bytes physical addressvirtual address

Address Mapping Mapping 2ndary Memory ( program/virtual ) addresses to Main Memory ( physical ) addresses page 1 page = 4096 bytes physical address used by hardware virtual address used by program virtualphysical

Paging page virtualphysical / 0 Illusion that Main Memory is Large Contiguous Linear Size(MM) = Size(2ndry M) Transparent to Programmer

Paging Implementation Virtual Address Space (Program) & Physical Address Space (MM) –Broken up into equal pages (just like cache & MM!!) Page size  Always a power of 2 Common Size: –512 to 64K bytes

Paging Implementation Page Frames Page Tables Programs use Virtual Addresses

Memory Mapping Note: 2ndry Mem = 64K; Main Mem = 32K Page Frame: home of VM pages in MM Page Table: home of mappings for VM pages Page #Page Frame #

Memory Mapping Memory Management Unit (MMU): Device that performs virtual-to-physical mapping MMU 15-bit Physical Address 32-bit VM Address

Memory Management Unit 32-bit Virtual Address Broken into 2 portions 20-bit 12-bit Virtual page # offset in page (since our pages are 4KB) How to determine if page is in MM? Present/Absent Bit in Page Table Entry MMU

Demand Paging Possible Mapping of pages Page Fault: Requested page is not in MM Demand Paging: Page is demanded by program Page is loaded into MM

Demand Paging Possible Mapping of pages Page Fault: Requested page is not in MM Demand Paging: Page is demanded by program Page is loaded into MM But… What to bring in for a program on start up?

Working Set Set of pages used by a process Each process has a unique memory map Importance in regards to a multi-tasked OS At time t, there is a set of all k recently used pages References tend to cluster on a small number of pages Put this set to Work!!! Store & Load it during Process Switching

Page Replacement Policy Working Set: –Set of pages used actively & heavily –Kept in memory to reduce Page Faults Set is found/maintained dynamically by OS Replacement: OS tries to predict which page would have least impact on the running program Common Replacement Schemes: Least Recently Used (LRU) First-In-First-Out (FIFO)

Replacement Policy Placement Policy –Which page is replaced? –Page removed should be the page least likely to be referenced in the near future –Most policies predict the future behavior on the basis of past behavior

Basic Replacement Algorithms Least Recently Used (LRU) –Replaces the page that has not been referenced for the longest time –By the principle of locality, this should be the page least likely to be referenced in the near future –Each page could be tagged with the time of last reference. This would require a great deal of overhead.

SRAM DRAM DRAMs use only one transistor, plus a capacitor. DRAMs are smaller and less expensive because SRAMs are made from four to six transistors (flip flops) per bit. SRAMs don't require external refresh circuitry or other work in order for them to keep their data intact. SRAM is faster than DRAM

It has been discovered that for about 90% of the time that our programs execute only 10% of our code is used! This is known as the Locality Principle –Temporal Locality When a program asks for a location in memory, it will likely ask for that same location again very soon thereafter –Spatial Locality When a program asks for a memory location at a memory address (lets say 1000)… It will likely need a nearby location 1001,1002,1003,10004 … etc.

fastest possible access (usually 1 CPU cycle) Registers <1 ns often accessed in just a few cycles, usually tens – hundreds of kilobytes ~$80/MB Level 1 (SRAM) cache 2-8ns higher latency than L1 by 2× to 10×, now multi-MB ~$80/MB Level 2 (SRAM) cache 5-12ns may take hundreds of cycles, but can be multiple gigabytes eg.2GB $11 ($0.0055/MB) Main memory (DRAM) ns millions of cycles latency, but very large eg.1TB $139 ($ /MB) Disk storage 3,000, ,000,000 ns several seconds latency, can be huge Tertiary storage (really slow) For a 1 GHz CPU a 50 ns wait means 50 wasted clock cycles Main Memory and Disk estimates Fry’s Ad 10/16/2008

We established that the Locality principle states that only a small amount of Memory is needed for most of the program’s lifetime… We now have a Memory Hierarchy that places very fast yet expensive RAM near the CPU and larger – slower – cheaper RAM further away… The trick is to keep the data that the CPU wants in the small expensive fast memory close to the CPU … and how do we do that???

Hardware and the Operating System are responsible for moving data throughout the Memory Hierarchy when the CPU needs it. Modern programming languages mainly assume two levels of memory, main memory and disk storage. Programmers are responsible for moving data between disk and memory through file I/O. Optimizing compilers are responsible for generating code that, when executed, will cause the hardware to use caches and registers efficiently.

A computer program or a hardware-maintained structure that is designed to manage a cache of information When the smaller cache is full, the algorithm must choose which items to discard to make room for the new data The "hit rate" of a cache describes how often a searched-for item is actually found in the cache The "latency" of a cache describes how long after requesting a desired item the cache can return that item

Each replacement strategy is a compromise between hit rate and latency. Direct Mapped Cache –The direct mapped cache is the simplest form of cache and the easiest to check for a hit. –Unfortunately, the direct mapped cache also has the worst performance, because again there is only one place that any address can be stored. Fully Associative Cache –The fully associative cache has the best hit ratio because any line in the cache can hold any address that needs to be cached. –However, this cache suffers from problems involving searching the cache –A replacement algorithm is used usually some form of a LRU "least recently used" algorithm N-Way Set Associative Cache –The set associative cache is a good compromise between the direct mapped and set associative caches.

Virtual Memory is basically the extension of physical main memory (RAM) into a lower cost portion of our Memory Hierarchy (lets say… Hard Disk) A form of the Overlay approach, managed by the OS, called Paging is used to swap “pages” of memory back and forth between the Disk and Physical Ram. Hard Disks are huge, but to you remember how slow they are??? Millions of times slower that the other memories in our pyramid.