1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Slides:



Advertisements
Similar presentations
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
Now, Review of Memory Hierarchy
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
CS 61C L35 Caches IV / VM I (1) Garcia, Fall 2004 © UCB Andy Carle inst.eecs.berkeley.edu/~cs61c-ta inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.
CIS629 - Fall 2002 Caches 1 Caches °Why is caching needed? Technological development and Moore’s Law °Why are caches successful? Principle of locality.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
ECE 232 L27.Virtual.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 27 Virtual.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Lecture 19: Virtual Memory
Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Lecture 14 Memory Hierarchy and Cache Design Prof. Mike Schulte Computer Architecture ECE 201.
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5B:Virtual Memory Adapted from Slides by Prof. Mary Jane Irwin, Penn State University Read Section 5.4,
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
Computer Organization & Programming
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
CPE 626 CPU Resources: Introduction to Cache Memories Aleksandar Milenkovic Web:
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
COSC3330 Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Lecture 13: Reduce Cache Miss Rates
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Lecture 23: Cache, Memory, Virtual Memory
CPE 631 Lecture 05: Cache Design
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSC3050 – Computer Architecture
Cache Memory Rabi Mahapatra
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Presentation transcript:

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01

2 A typical memory hierarchy today: Here we focus on L1/L2/L3 caches and main memory What Is Memory Hierarchy Proc/Regs L1-Cache L2-Cache Memory Disk, Tape, etc. BiggerFaster L3-Cache (optional)

3 1980: no cache in µproc; level cache on chip (1989 first Intel µproc with a cache on chip) Why Memory Hierarchy? µProc 60%/yr. DRAM 7%/yr DRAM CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance “Moore’s Law”

4 Generations of Microprocessors Time of a full cache miss in instructions executed: 1st Alpha: 340 ns/5.0 ns = 68 clks x 2 or 136 2nd Alpha:266 ns/3.3 ns = 80 clks x 4 or 320 3rd Alpha:180 ns/1.7 ns =108 clks x 6 or 648 1/2X latency x 3X clock rate x 3X Instr/clock  4.5X

5 Area Costs of Caches Processor % Area %Transistors (­cost)(­power) Intel %0% Alpha %77% StrongArm SA11061%94% Pentium Pro64%88% 2 dies per package: Proc/I$/D$ + L2$ Itanium92% Caches store redundant data only to close performance gap

6 What Is Cache, Exactly? Small, fast storage used to improve average access time to slow memory; usually made by SRAM Exploits locality: spatial and temporal In computer architecture, almost everything is a cache! Register file is the fastest place to cache variables First-level cache a cache on second-level cache Second-level cache a cache on memory Memory a cache on disk (virtual memory) TLB a cache on page table Branch-prediction a cache on prediction information? Branch-target buffer can be implemented as cache Beyond architecture: file cache, browser cache, proxy cache Here we focus on L1 and L2 caches (L3 optional) as buffers to main memory

7 Example: 1 KB Direct Mapped Cache Assume a cache of 2 N bytes, 2 K blocks, block size of 2 M bytes; N = M+K (#block times block size) (32 - N)-bit cache tag, K-bit cache index, and M-bit cache The cache stores tag, data, and valid bit for each block Cache index is used to select a block in SRAM (Recall BHT, BTB) Block tag is compared with the input tag A word in the data block may be selected as the output Index : Cache Data Byte : TagExample: 0x50 Ex: 0x01 0x50 Stored as part of the cache “state” Valid Bit : 31 Byte 1Byte 31 : Byte 32Byte 33Byte 63 : Byte 992 Byte 1023 : Cache Tag Block offset Ex: 0x00 9 Block address

8 For Questions About Cache Design Block placement: Where can a block be placed? Block identification: How to find a block in the cache? Block replacement: If a new block is to be fetched, which of existing blocks to replace? (if there are multiple choices Write policy: What happens on a write?

9 Where Can A Block Be Placed What is a block: divide memory space into blocks as cache is divided A memory block is the basic unit to be cached Direct mapped cache: there is only one place in the cache to buffer a given memory block N-way set associative cache: N places for a given memory block Like N direct mapped caches operating in parallel Reducing miss rates with increased complexity, cache access time, and power consumption Fully associative cache: a memory block can be put anywhere in the cache

10 Set Associative Cache Example: Two-way set associative cache Cache index selects a set of two blocks The two tags in the set are compared to the input in parallel Data is selected based on the tag comparison Set associative or direct mapped? Discuss later Cache Data Cache Block 0 Cache TagValid ::: Cache Data Cache Block 0 Cache TagValid ::: Cache Index Mux 01 Sel1Sel0 Cache Block Compare Adr Tag Compare OR Hit

11 How to Find a Cached Block Direct mapped cache: the stored tag for the cache block matches the input tag Fully associative cache: any of the stored N tags matches the input tag Set associative cache: any of the stored K tags for the cache set matches the input tag Cache hit time is decided by both tag comparison and data access – Can be determined by Cacti Model

12 Which Block to Replace? Direct mapped cache: Not an issue For set associative or fully associative* cache: Random: Select candidate blocks randomly from the cache set LRU (Least Recently Used): Replace the block that has been unused for the longest time FIFO (First In, First Out): Replace the oldest block Usually LRU performs the best, but hard (and expensive) to implement *Think fully associative cache as a set associative one with a single set

13 What Happens on Writes Where to write the data if the block is found in cache? Write through: new data is written to both the cache block and the lower-level memory Help to maintain cache consistency Write back: new data is written only to the cache block Lower-level memory is updated when the block is replaced A dirty bit is used to indicate the necessity Help to reduce memory traffic What happens if the block is not found in cache? Write allocate: Fetch the block into cache, then write the data (usually combined with write back) No-write allocate: Do not fetch the block into cache (usually combined with write through)

14 Real Example: Alpha Caches I-cache D-cache 64KB 2-way associative instruction cache 64KB 2-way associative data cache

15 Alpha Data Cache D-cache: 64K 2-way associative Use 48-bit virtual address to index cache, use tag from physical address 48-bit Virtual=>44-bit address 512 block (9-bit blk index) Cache block size 64 bytes (6-bit offset)t Tag has 44-(9+6)=29 bits Writeback and write allocated (We will study virtual- physical address translation)

16 Calculate average memory access time (AMAT) Example: hit time = 1 cycle, miss time = 100 cycle, miss rate = 4%, than AMAT = 1+100*4% = 5 Calculate cache impact on processor performance Note cycles spent on cache hit is usually counted into execution cycles Cache performance

17 Disadvantage of Set Associative Cache Compare n-way set associative with direct mapped cache: Has n comparators vs. 1 comparator Has Extra MUX delay for the data Data comes after hit/miss decision and set selection In a direct mapped cache, cache block is available before hit/miss decision Use the data assuming the access is a hit, recover if found otherwise Cache Data Cache Block 0 Cache TagValid ::: Cache Data Cache Block 0 Cache TagValid ::: Cache Index Mux 01 Sel1Sel0 Cache Block Compare Adr Tag Compare OR Hit

Virtual Memory Virtual memory (VM) allows programs to have the illusion of a very large memory that is not limited by physical memory size Make main memory (DRAM) acts like a cache for secondary storage (magnetic disk) Otherwise, application programmers have to move data in/out main memory That’s how virtual memory was first proposed Virtual memory also provides the following functions Allowing multiple processes share the physical memory in multiprogramming environment Providing protection for processes (compare Intel 8086: without VM applications can overwrite OS kernel) Facilitating program relocation in physical memory space

19 VM Example

20 Virtual Memory and Cache VM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory and secondary storage. Cache terms vs. VM terms Cache block => page Cache Miss => page fault Tasks of hardware and OS TLB does fast address translations OS handles less frequently events:  page fault  TLB miss (when software approach is used)

Virtual Memory and Cache

4 Qs for Virtual Memory Q1: Where can a block be placed in the upper level? Miss penalty for virtual memory is very high => Full associativity is desirable (so allow blocks to be placed anywhere in the memory) Have software determine the location while accessing disk (10M cycles enough to do sophisticated replacement) Q2: How is a block found if it is in the upper level? Address divided into page number and page offset Page table and translation buffer used for address translation Q: why fully associativity does not affect hit time?

23 4 Qs for Virtual Memory Q3: Which block should be replaced on a miss? Want to reduce miss rate & can handle in software Least Recently Used typically used A typical approximation of LRU  Hardware set reference bits  OS record reference bits and clear them periodically  OS selects a page among least-recently referenced for replacement Q4: What happens on a write? Writing to disk is very expensive Use a write-back strategy

24 Virtual-Physical Translation A virtual address consists of a virtual page number and a page offset. The virtual page number gets translated to a physical page number. The page offset is not changed Virtual Page NumberPage offset Physical Page NumberPage offset Translation Virtual Address Physical Address 36 bits 33 bits 12 bits

25 Address Translation Via Page Table Assume the access hits in main memory

26 TLB: Improving Page Table Access Cannot afford accessing page table for every access include cache hits (then cache itself makes no sense) Again, use cache to speed up accesses to page table! (cache for cache?) TLB is translation lookaside buffer storing frequently accessed page table entry A TLB entry is like a cache entry Tag holds portions of virtual address Data portion holds physical page number, protection field, valid bit, use bit, and dirty bit (like in page table entry) Usually fully associative or highly set associative Usually 64 or 128 entries Access page table only for TLB misses

27 TLB Characteristics The following are characteristics of TLBs TLB size : 32 to 4,096 entries Block size : 1 or 2 page table entries (4 or 8 bytes each) Hit time: 0.5 to 1 clock cycle Miss penalty: 10 to 30 clock cycles (go to page table) Miss rate: 0.01% to 0.1% Associative : Fully associative or set associative Write policy : Write back (replace infrequently)