Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:

Slides:

Advertisements

Similar presentations

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Chapter 7 Large and Fast: Exploiting Memory Hierarchy Bo Cheng.

Memory Chapter 7 Cache Memories.

331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.

Computing Systems Memory Hierarchy.

Lecture 19: Virtual Memory

Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.

Memory/Storage Architecture Lab Computer Architecture Memory Hierarchy.

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

Chapter Twelve Memory Organization

CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and

1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.

Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.

Lecture 08: Memory Hierarchy Cache Performance Kai Bu

Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.

Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.

1  1998 Morgan Kaufmann Publishers Chapter Seven.

Improving Memory Access 2/3 The Cache and Virtual Memory

Caches 1 Computer Organization II © McQuain Memory Technology Static RAM (SRAM) – 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM)

The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.

Chapter 9 Memory Organization. 9.1 Hierarchical Memory Systems Figure 9.1.

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Memory COMPUTER ARCHITECTURE

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

CS 704 Advanced Computer Architecture

Morgan Kaufmann Publishers Memory & Cache

ECE 445 – Computer Organization

Lecture 21: Memory Hierarchy

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

Part V Memory System Design

Lecture 23: Cache, Memory, Virtual Memory

Lecture 08: Memory Hierarchy Cache Performance

ECE 445 – Computer Organization

Chap. 12 Memory Organization

Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

Contents Memory types & memory hierarchy Virtual memory (VM)

CSC3050 – Computer Architecture

Lecture 21: Memory Hierarchy

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Cache - Optimization.

Cache Memory Rabi Mahapatra

Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )

Memory & Cache.

Presentation transcript:

Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 1 / 46 Memory Hierarchy  Principle of Locality ●Temporal Locality (Locality in Time) ●Spatial Locality (Locality in Space)  Speed & Size TechnologyAccess TimeRelative Cost/GB SRAM0.5 ns10,000 DRAM50 ns100 Magnetic Disk5,000,000 ns1

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 2 / 46 Memory Hierarchy CPU Cache Main Memory Magnetic Disks

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 3 / 46 Cache Memory  High Speed (Towards CPU) ●Conceals Slow Memory  Small Size (Low Cost) CPU Cache (Fast)  Cache Main Memory (Slow)  Mem Hit Miss 95% hit ratio  Access =  Cache +  Mem

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 4 / 46 Cache Memory  CPU – Main Memory Address ●Cache Size < Main Memory Size CPU Cache 1 MB Main Memory 4 GB 32-bit Address Only 20 bits !!!

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 5 / 46 Cache Memory Cache Main Memory FFFFFFF FFFFF Address Mapping !!!

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 6 / 46 Associative Memory Cache FFFFF Main Memory FFFFFFF Address (Key) Data Cache Location

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 7 / 46 Associative Memory Cache Bits (Key) 8 Bits (Data) 4 8 Data Address Can have any number of locations

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 8 / 46 Associative Memory Cache Bits (Key) 8 Bits (Data) Address = ? = ? = ? How many comparators?

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 9 / 46 Associative Memory Cache Bits (Key) 8 Bits (Data) Address = ? = ? = ? Valid Bit

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 10 / 46 Associative Memory Cache Bits (Key) 32 Bits (Data) 4 8 Data 32 Bits Address

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 11 / 46 Direct Mapping Cache Cache C Address 000 Tag 1 6 Data Compare Match No match 12 Bits (Tag) 8 Bits (Data) FFFFF 20 Bits (Index) What happens when Address =

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 12 / 46 Cache Direct Mapping Cache Address 12 Bits (Tag) 32 Bits (Data) FFFF 18 Bits (Index) Tag Compare Match No match Select 4 8 Data

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 13 / 46 Cache 2-Way Set Associative Set Associative Cache C Address 000 Tag 1 6 Data Compare No match 12 Bits (Tag) 8 Bits (Data) FFFFF 20 Bits (ndex) Match Tag Data Compare

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 14 / 46 Cache Size Example: Number of Blocks = 4 K Block Size = 4 Words Word Size = 32 bits Address Size = 32 bits Tag Bits (Direct Mapping Cache) = Tag Bits (2-Way Set Associative) = Tag Bits (4-Way Set Associative) = Tag Bits (Associative Cache) =

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 15 / 46 Block Size  Increasing Block Size ●Utilizes Spatial Locality ●Reduces the Number of Blocks

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 16 / 46 Cache Performance Example: CPU CPI = 2 clocks/instruction Loads & Stores instructions = 36% Instruction Cache = 2% miss rate Data Cache = 4% miss rate Memory Miss Penalty = 100 clocks Instructions Penalties = Data Penalties = CPI (with penalties) = Perfect Cache Speedup =

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 17 / 46 Cache Performance  Average Memory Access Time (AMAT) ●AMAT = Time for a Hit + Miss Rate × Miss Penalty Example: Clock Cycle = 1 ns Cache Access Time (Hit) = 1 ns Cache Miss Penalty = 20 Clocks Miss Rate = 5% AMAT =

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 18 / 46 Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 4 blocks (Direct Mapping) CPU Reference Miss Cache 0 Tag 1 2 3

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 19 / 46 Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 2 blocks (2-Way Set Associative) CPU Reference Cache 0 Tag Miss HitMiss

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 20 / 46 Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 4 blocks (Associative) CPU Reference Cache Miss HitMissHit

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 21 / 46 Instruction Cache  Cache Miss ●Send original PC value to memory ●Perform a read operation ●Wait for the cache to receive the instruction ●Restart instruction execution

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 22 / 46 Data Cache Writes  Write-Through ●Consistent Copies ●Slow CPU Cache Mem

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 23 / 46 Data Cache Writes  Write-Through ●Consistent Copies ●Slow Example: CPI without miss = 1 Memory delays = 100 clocks 10% of memory references are writes Overall CPI =

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 24 / 46 Data Cache Writes  Write-Through with Write Buffer ●Buffer size ●Fill-Rate and Mem-Rate (Possible Stall) CPU Cache Mem

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 25 / 46 Data Cache Writes  Write-Back ●Fast ●Complex & inconsistent copies CPU Cache Mem Block Replacement

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 26 / 46 Data Cache Writes CPU Cache Mem MissBlock Replacement  Write-Back ●Fast ●Complex & inconsistent copies

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 27 / 46 Data Cache Writes CPU Cache Mem MissBlock Replacement  Write-Back with Buffer ●Reduces the “Miss” penalty by 50%

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 28 / 46 Cache Replacement Policies  First In First Out (FIFO) ●Simple ●May replace a block which is used more, leading to a miss  Least Recently Used (LRU) ●More complex ●Better Hit Rate

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 29 / 46 Multilevel Cache Example: CPU CPI = 4 GHz  0.25 ns Clock Primary Cache Miss Rate = 2% Memory Access Time = 100 ns  400 Clocks CPI (Single Level Cache) = Total Miss Rate = 0.5% Secondary Cache Access Time = 5 ns  20 Clocks CPI (2-Level Cache) =

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 30 / 46 Main Memory  Latency & Bandwidth ●Address (Selection of row & column) ●Data Transfer (Number of bits) Example: Send Address = 1 clock Memory Access = 15 clocks Transfer a 32-bit Word = 1 clock Cache Block = 4 words Cache Miss Memory Bandwidth =

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 31 / 46 Main Memory CPU Cache Mem 1 Word CPU Cache Mem 1 Word2 Words CPU Cache Mem 1 Word Mem Example:  Simple Design  Wide Bus  Interleaved

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 32 / 46 DRAM Technology YearChip Size$ per GB Total Access Time Column Access Time Kbit$ 1,500, ns150 ns Kbit$ 500, ns100 ns Mbit$ 200, ns40 ns Mbit$ 50, ns40 ns Mbit$ 15,00090 ns30 ns Mbit$ 10,00060 ns12 ns Mbit$ 4,00060 ns10 ns Mbit$ 1,00055 ns7 ns Mbit$ ns5 ns Gbit$ 5040 ns1.25 ns

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 33 / 46 Virtual Memory  Allow Efficient & Safe Sharing of Memory ●Memory Protection ●Program Relocatability  Remove Programming Burdens of Small Memory ●Much Larger Memory Space ●Reuse Physical Memory

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 34 / 46 Offset Virtual Memory Segmentation  Segments ●Variable Size ●Two-Part Address Segment 0 Segment Frame 0 Frame 1 Segment 1 Segment 0 Segment Number 31 0 ?? 0 Segment # Offset 23 0 Translation

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 35 / 46 Virtual Memory Paging  Virtual Memory ●Pages ●Stored on Disk ●Virtual Address  Physical Memory ●Frames ●Stored in RAM ●Physical Address  Page Faults Page 0 Page Frame 0 Frame 1 Page 1 Page 0

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 36 / 46 Physical Address Virtual Address Virtual Memory Paging  Address Translation Page 0 Page Frame 0 Frame 1 Page 1 Page 0 Virtual Page Number Page Offset Physical Page # Page Offset Translation

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 37 / 46 Paging Table  Page Table ●Virtual to Physical Page Number Translation ●Stored in RAM ●Page Table Register Valid Virtual Page Physical Page Page Table Register

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 38 / 46 Paging Table  Page Faults ●Swap Space ♦ Reserved space for full virtual memory space for a process ♦ Stored on Disk ●Page Table ●LRU Replacement Scheme Page Table ValidPointer 1 ● 0 ● 1 ● 0 ● 1 ● 0 ● 0 ● Virtual Page Number Physical Memory Disk Storage

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 39 / 46 Paging Table  Page Table Size Example: Virtual Address: 32 bits Page Size: 4 KB Page Table: 4 Bytes/Entry Number of Pages = Page Table Size = Page Table ValidPointer 1 ● 0 ● 1 ● 0 ● 1 ● 0 ● 0 ● Virtual Page Number Physical Memory Disk Storage

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 40 / 46 Translation-Lookaside Buffer (TLB)  Address Translation Cache Physical Memory Disk Storage TLB ValidDirtyRefTagPhysical Page 101 ● 111 ● 000 ● 101 ● 000 ● Virtual Page Number Page Table ValidDirtyRefPhysical Page 101 ● 111 ● 000 ● 101 ● 000 ● 100 ●

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 41 / 46 Virtual Memory Misses  TLB Miss  Page Fault  Cache Miss Virtual Address TLB Page Fault Page Table Update TLB Miss Hit Cache Hit Miss

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 42 / 46 Memory Hierarchy Misses  Compulsory Miss  Capacity Miss  Conflict Miss DesignMiss RatePerformance Increase Cache Size Decrease Capacity Misses May Increase Access Time Increase Associativity Decrease Conflict Misses May Increase Access Time Increase Block Size Decrease Compulsory Misses Increases Miss Penalty

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 43 / 46 Parallelism & Cache Coherence  Coherence ●What values can be returned by a read  Consistency ●When a written value will be returned by a read Main Memory CacheCache ProcessorProcessor CacheCache ProcessorProcessor 0 0 1

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 44 / 46 Cache Coherence Enforcement  Migration (of Data to Local Caches) ●Reduces latency & bandwidth for shared memory.  Replication (of Read-shared Data) ●Reduces latency & contention for access

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 45 / 46 Cache Coherence Protocol  Snooping ●Each cache monitors bus reads/writes. ●Processors exchange full blocks. ●Large block sizes may lead to false sharing. Main Memory CacheCache ProcessorProcessor CacheCache ProcessorProcessor Invalidate 0 1

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 46 / 46 Cache Coherence Protocol  Directory-based protocols ●Caches and memory record sharing status of blocks in a directory.

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. Chapter 5