The Memory Hierarchy Cache, Main Memory, and Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.

Slides:

Advertisements

Similar presentations

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

Advertisements

Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014

1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

The Memory Hierarchy II CPSC 321 Andreas Klappenecker.

Caching I Andreas Klappenecker CPSC321 Computer Architecture.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan

Revision Mid 2 Prof. Sin-Min Lee Department of Computer Science.

1  2004 Morgan Kaufmann Publishers Chapter Seven.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

Lecture 32: Chapter 5 Today’s topic –Cache performance assessment –Associative caches Reminder –HW8 due next Friday 11/21/2014 –HW9 due Wednesday 12/03/2014.

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

Lecture 31: Chapter 5 Today’s topic –Direct mapped cache Reminder –HW8 due 11/21/

Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.

Computing Systems Memory Hierarchy.

Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy.

Five Components of a Computer

Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

1 CMPE 421 Advanced Computer Architecture Accessing a Cache PART1.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

Morgan Kaufmann Publishers

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 11: Memory Hierarchy.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

1  1998 Morgan Kaufmann Publishers Chapter Seven.

Computer Organization CS224 Fall 2012 Lessons 39 & 40.

Caches 1 Computer Organization II © McQuain Memory Technology Static RAM (SRAM) – 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM)

The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

1 Lecture 10 Cache Peng Liu Physical Size Affects Latency 2 Small Memory CPU Big Memory CPU  Signals have further to travel  Fan.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

Computer Organization CS224 Fall 2012 Lessons 37 & 38.

Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.

Chapter 7 Large and Fast: Exploiting Memory Hierarchy.

CS161 – Design and Architecture of Computer

CMSC 611: Advanced Computer Architecture

Memory Hierarchy Ideal memory is fast, large, and inexpensive

CS161 – Design and Architecture of Computer

The Goal: illusion of large, fast, cheap memory

CS352H: Computer Systems Architecture

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

Morgan Kaufmann Publishers Memory & Cache

Morgan Kaufmann Publishers

ECE 445 – Computer Organization

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers Memory Hierarchy: Introduction

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Memory Principles.

Presentation transcript:

The Memory Hierarchy Cache, Main Memory, and Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

The Simple View of Memory The simplest view of memory is that presented at the ISA (Instruction Set Architecture) level. At this level, memory is a monolithic addressable unit. This view suffices for all programming uses.

The Multi-Level View of Memory Real memory has at least three levels, each of which can be elaborated further. The fact that most cache memories are now multi-level does not change the basic design issues.

The More Realistic View

Generic Primary / Secondary Memory In each case, we have a fast primary memory backed by a bigger secondary memory. The “actors” in the two cases are as follows: TechnologyPrimarySecondaryBlock Cache MemorySRAM CacheDRAM MainCache Line Memory Virtual MemoryDRAM Main DiskPage MemoryMemory Access TimeT P T S (Primary (Secondary) Time)Time)

Effective Access Time Effective Access Time: T E = h  T P + (1 – h)  T S, where h (the primary hit rate) is the fraction of memory accesses satisfied by the primary memory; 0.0  h  1.0. This can be extended to multi-level caches and mixed memory with cache and virtual memory.

Examples: Cache Memory Suppose a single cache fronting a main memory, which has 80 nanosecond access time. Suppose the cache memory has access time 10 nanoseconds. If the hit rate is 90%, then T E = 0.9  (1 – 0.9)  80.0 = 0.9   80.0 = = 17.0 ns. If the hit rate is 99%, then T E = 0.99  (1 – 0.99)  80.0 = 0.99   80.0 = = 10.7 ns.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 8 Memory Technology Static RAM (SRAM) 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM) 50ns – 70ns, $20 – $75 per GB Magnetic disk 5ms – 20ms, $0.20 – $2 per GB Ideal memory Access time of SRAM Capacity and cost/GB of disk §5.1 Introduction

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 9 Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to be accessed again soon e.g., instructions in a loop, induction variables Spatial locality Items near those accessed recently are likely to be accessed soon E.g., sequential instruction access, array data

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 10 Taking Advantage of Locality Memory hierarchy Store everything on disk Copy recently accessed (and nearby) items from disk to smaller DRAM memory Main memory Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory Cache memory attached to CPU

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 11 Memory Hierarchy Levels Block (aka line): unit of copying May be multiple words If accessed data is present in upper level Hit: access satisfied by upper level Hit ratio: hits/accesses If accessed data is absent Miss: block copied from lower level Time taken: miss penalty Miss ratio: misses/accesses = 1 – hit ratio Then accessed data supplied from upper level

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 12 Cache Memory Cache memory The level of the memory hierarchy closest to the CPU Given accesses X 1, …, X n–1, X n §5.2 The Basics of Caches How do we know if the data is present? Where do we look?

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 13 Direct Mapped Cache Location determined by address Direct mapped: only one choice (Block address) modulo (#Blocks in cache) #Blocks is a power of 2 Use low-order address bits

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 14 Tags and Valid Bits How do we know which particular block is stored in a cache location? Store block address as well as the data Actually, only need the high-order bits Called the tag What if there is no data in a location? Valid bit: 1 = present, 0 = not present Initially 0

The Dirty Bit In some contexts, it is important to mark the primary memory if the data have been changed since being copied from the secondary memory. For historical reasons, this bit is called the “dirty bit”, denoted D. If D = 0, the block does not need to be written back to secondary memory prior to being replaced. This is an efficiency consideration.

The Cache Line Tag The primary and secondary memories are divided into equally sized blocks. Suppose a cache line size M = 2 m bytes. This would be the size of a primary memory block. In an n-bit address, the lower m bits would be the offset in the block and (n – m) bits would identify the block. The upper (n – m) bits are the block address.

The Direct Mapped Cache Suppose that the direct mapped cache has K = 2 k cache lines. The full memory address can be divided as: The lower k bits of the block address always determine the cache line. For this reason, these bits are not part of the tag. Bitsn – k – mkm Cache ViewTagLineOffset Address ViewBlock AddressOffset

Simple Example from the Text In this next example, each cache line holds only one entry. There are 8 = 2 3 cache lines. Consider 5 bit addresses. n = 5, k = 3, m = 0 (1 = 2 0 ). The tag thus has 2 bits. NOTE:The number of entries in a cache line and also the number of lines in the cache must both be a power of 2. Otherwise, the addressing is impossible.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 19 Cache Example 8-blocks, 1 word/block, direct mapped Initial state IndexVTagData 000N 001N 010N 011N 100N 101N 110N 111N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 20 Cache Example IndexVTagData 000N 001N 010N 011N 100N 101N 110Y10Mem[10110] 111N Word addrBinary addrHit/missCache block Miss110

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 21 Cache Example IndexVTagData 000N 001N 010Y11Mem[11010] 011N 100N 101N 110Y10Mem[10110] 111N Word addrBinary addrHit/missCache block Miss010

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 22 Cache Example IndexVTagData 000N 001N 010Y11Mem[11010] 011N 100N 101N 110Y10Mem[10110] 111N Word addrBinary addrHit/missCache block Hit Hit010

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 23 Cache Example IndexVTagData 000Y10Mem[10000] 001N 010Y11Mem[11010] 011Y00Mem[00011] 100N 101N 110Y10Mem[10110] 111N Word addrBinary addrHit/missCache block Miss Miss Hit000

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 24 Cache Example IndexVTagData 000Y10Mem[10000] 001N 010Y10Mem[10010] 011Y00Mem[00011] 100N 101N 110Y10Mem[10110] 111N Word addrBinary addrHit/missCache block Miss010

Another Direct-Mapped Cache Assume a byte-addressable memory with 32-bit addresses. Assume 256 cache lines. As 256 = 2 8, 8 bits of the address are used to select the cache line. Assume each cache line holds 16 bytes. As 16 = 2 4, 4 bits of the address are used to specify the offset within the cache line. The main memory is divided into blocks of size 16 bytes, each the size of a cache line.

Fields in the Memory Address Divide the 32–bit address into three fields: a 20–bit explicit tag, an 8–bit line number, and a 4–bit offset within the cache line. Consider the address 0x00AB7129. It would have Tag =0x00AB7 Line =0x12Block Address = 0x00AB12 Offset =0x9 Bits31 – 1211 – 43 – 0 Cache ViewTagLineOffset Address ViewBlock AddressOffset

Associative Caches In a direct-mapped cache, each memory block from the main memory can be mapped into exactly one location in the cache. Other cache organizations allow some flexibility in memory block placement. One option for flexible placement is called an associative cache, based on content addressable memory.

Associative Memory In associative memory, the contents of the memory are searched in one memory cycle. Consider an array of 256 entries, indexed from 0 to 255 (or 0x0 to 0xFF). Standard search strategies require either 128 tries (unordered) or 8 tries (binary search). In content addressable memory, only one search is required.

Associative Search Associative memory would find the item in one search. Think of the control circuitry as “broadcasting” the data value to all memory cells at the same time. If one of the memory cells has the value, it raises a Boolean flag and the item is found. Some associative memories allow duplicate values and resolve multiple matches. Cache designs do not allow duplicate values.

The Associative Match This shows a single word in a 4-bit content addressable memory and the circuit to generate the match signal (asserted low).

The Associative Cache Again, a 32-bit address with 16-byte cache lines (4 bits for offset in cache). The number of cache lines is not important for address handling in associative caches. The address will divide as follows: 28 bits for the cache tag, and 4 bits for the offset in the cache line. The cache tags will be stored in associative memory connected to the cache..

The Associative Cache Line A cache line in this arrangement would have the following format for our sample address. Here we assume that the CPU has not written to the cache line, so the dirty bit is D = 0. D bitV BitTag16 indexed entries 010x00AB712M[0xAB7120] … M[0xAB712F]

Set Associative Cache An N–way set–associative cache uses direct mapping, but allows a set of N memory blocks to be stored in the line. This allows some of the flexibility of a fully associative cache, without the complexity of a large associative memory for searching the cache. Suppose a 4-way set-associative cache with 16 bytes per memory block. Each cache line has 4 sets of 16, or 64 bytes.

Sample 2-way Set Associative Consider addresses 0xCD4128 and 0xAB7129. Each would be stored in cache line 0x12. Set 0 of this cache line would have one block, and set 1 would have the other. Set 0Set 1 DVTagContentsDVTagContents 110xCD4M[0xCD4120] to M[0xCD412F] 010xAB7M[0xAB7120] to M[0xAB712F]

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 35 Associative Caches Fully associative Allow a given block to go in any cache entry Requires all entries to be searched at once Comparator per entry (expensive) n-way set associative Each set contains n entries Block number determines which set (Block number) modulo (#Sets in cache) Search all entries in a given set at once n comparators (less expensive)

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 36 Spectrum of Associativity For a cache with 8 entries

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 37 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative Block access sequence: 0, 8, 0, 6, 8 Direct mapped Block address Cache index Hit/missCache content after access missMem[0] 80missMem[8] 00missMem[0] 62missMem[0]Mem[6] 80missMem[8]Mem[6]

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 38 Associativity Example 2-way set associative Block address Cache index Hit/missCache content after access Set 0Set 1 00missMem[0] 80missMem[0]Mem[8] 00hitMem[0]Mem[8] 60missMem[0]Mem[6] 80missMem[8]Mem[6] Fully associative Block address Hit/missCache content after access 0missMem[0] 8missMem[0]Mem[8] 0hitMem[0]Mem[8] 6missMem[0]Mem[8]Mem[6] 8hitMem[0]Mem[8]Mem[6]

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 39 How Much Associativity Increased associativity decreases miss rate But with diminishing returns Simulation of a system with 64KB D-cache, 16-word blocks, SPEC way: 10.3% 2-way: 8.6% 4-way: 8.3% 8-way: 8.1%

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 40 Set Associative Cache Organization