Chapter IX Memory Organization CS 147 Presented by: Duong Pham.

Slides:



Advertisements
Similar presentations
M. Mateen Yaqoob The University of Lahore Spring 2014.
Advertisements

Chapter 12 Memory Organization
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
Processor - Memory Interface
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Chapter 101 Virtual Memory Chapter 10 Sections and plus (Skip:10.3.2, 10.7, rest of 10.8)
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Organization.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
VIRTUAL MEMORY. Virtual memory technique is used to extents the size of physical memory When a program does not completely fit into the main memory, it.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Unit-4 (CO-MPI Autonomous)
Maninder Kaur CACHE MEMORY 24-Nov
Cache performance CS 147 Prof. Lee Hai Lin Wu Cache performance  Introduction  Primary components –Cache hits Hit ratio –Cache misses  Average memory.
Memory Systems Architecture and Hierarchical Memory Systems
Cache memory October 16, 2007 By: Tatsiana Gomova.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
CMPE 421 Parallel Computer Architecture
Memory Hierarchy and Cache Memory Jennifer Tsay CS 147 Section 3 October 8, 2009.
Chapter 9: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel.
Chapter Twelve Memory Organization
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Computer Architecture And Organization UNIT-II Structured Organization.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIT 301 (Blum)1 Cache Based in part on Chapter 9 in Computer Architecture (Nicholas Carter)
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Computer Architecture Lecture 26 Fasih ur Rehman.
Chapter 9 Memory Organization By Jack Chung. MEMORY? RAM?
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
CS 1104 Help Session I Caches Colin Tan, S
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
Operating Systems (CS 340 D) Princess Nora University Faculty of Computer & Information Systems Computer science Department.
Cache Memory By Tom Austin. What is cache memory? A cache is a collection of duplicate data, where the original data is expensive to fetch or compute.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Operating Systems (CS 340 D) Princess Nora University Faculty of Computer & Information Systems Computer science Department.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Chapter 9 Memory Organization By Nguyen Chau Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Cache Memory By Ed Martinez.  The fastest and most expensive memory on a computer system that is used to store collections of data.  Uses very short.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
Chapter 9 Memory Organization. 9.1 Hierarchical Memory Systems Figure 9.1.
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Computer Organization
Cache Memory.
The 8085 Microprocessor Architecture
CAM Content Addressable Memory
CSC 4250 Computer Architectures
How will execution time grow with SIZE?
Cache Memory Presentation I
Module IV Memory Organization.
Chap. 12 Memory Organization
Memory Organization.
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Contents Memory types & memory hierarchy Virtual memory (VM)
CS-447– Computer Architecture Lecture 20 Cache Memories
Presentation transcript:

Chapter IX Memory Organization CS 147 Presented by: Duong Pham

Introduction In chapter IV we look at two simple computers consisting of a CPU, I/O subsystem, and a memory subsystem. The memory of these computers was build using only ROM and RAM. This memory subsystem is fine for computers that perform a specific task: –examples: controlling a microwave oven – controlling a dishwasher, etc.. However, a complex computers cannot run on a memory subsystem consisting only of such physical memory because it would be relatively slow and somewhat limited.

Overview Hierarchy of Memory System Cache Memory –Associative Memory –Cache Memory with Associative Mapping –Cache Memory with Direct Mapping –Cache Memory with Set-Associative Mapping –Replacing Data in the Cache –Writing Data to the Cache –Cache Performance

Hierarchy of Memory System A computer system is not constructed using a single type of memory. In fact, several types of memory are used. –For examples: Level 1 cache (L1 cache) – Level 2 cache (L2 cache) – Physical Memory – Virtual Memory The most well known element of the memory subsystem is the physical memory, which is constructed using DRAM chips. There is also a cache controller which copies data from the physical memory to cache memory before or when the CPU needs it. In general, the closer a component is to the processor, the faster it is and the more expensive it is. Therefore, memory system tend to increase in size as they move away from the CPU.

CPU with L1 cache CPU with L1 cache L2 cache L2 cache Physical memory Physical memory Virtual memory storage Virtual memory storage This is the hierarchy of the memory system.

Cache Memory In general, the goal of cache memory is to minimize the processor’s memory access time at a reasonable cost. The main design of these cache memory is to move instructions and data into cache before the microprocessor’s tries to access them. This means that if we were to achieve this goal, system performance would improved greatly. This is the principle behind the Harvard architecture for computers. Instead of have separate caches for instructions and data, it may have one unified cache for both.

Associative Memory Cache memory can be constructed using either SRAM or associative memory (content addressable memory). Unlike other RAM, associative memory is accessed differently. To access data in associative memory, it searches all of its locations in parallel and marks the locations that match the specified data input. The matching data are then read out sequentially.

Associative Memory cont. To illustrate this, consider a simple associative memory consisting of eight words, each with 16 bits. Note that each word has one additional bit labeled v. This is called the valid bit. If a 1 is shown, it indicates that the word contains valid data. The 0 shows that the data is not valid. Data register Mask register Memory Output register Match register Data Read Write Data v

Associative Memory cont. Example: –to accessed data in the associative memory that has 1010 as its four high order bits. –The CPU would load the value into the mask register. – Each bit that is to be checked, regardless of the value it has is set to 1; all the other bits are set to zero. –The CPU also loads the value 1010 xxxx xxxx xxxx into the data register. –The four leading bits are to be matched and the rest can be anything. –A match occurs if for every bit position that has a value of 1 in the mask register and the location of that valid bit is set to 1. Otherwise it’s set to zero.

Associative Memory cont. Writing data to associative memory is straight forward. The CPU supplies data to the data register and asserts the write signal. The associative memory checks for a location whose valid bit is zero. If it finds one, it will store that information into that location. If it find none, it must clear out a location before it can store that data.

Cache Memory with Associative Mapping Associative memory can be used to construct a cache with associative mapping, or an associative cache. The figure shown at right is an associative cache for a 68K of 8-bit memory system. An associative cache from associative memory that is 24-bit wide. The first 16-bit is the memory address. The last 8-bit would be data that is stored in physical memory. It works just like the associative memory as I’ve describe before. Data Register Address X Data Register Mask Register Output Register Match Register Memory Address Data 168 Valid bit

Cache Memory with Direct Mapping Since associative memory is much more expensive than SRAM, a cache mapping scheme that uses standard SRAM can be much more larger than associative cache and still cost less. This is called direct mapping. To illustrate this, we consider a 1k cache for the Relatively Simple (R.S) CPU as shown on the right. Since the cache is 1K, the 10 low- order address bits( index) select on specific location in the cache. As in associative cache, it contains a valid bit to denote whether or not the location has valid data. In addition, a tag field contains the high-order bits of the original address that were not a part of the index. Therefore, the six high-order bits are stored in the tag field. Last, the cached data value is stored as the value. Output Register From R.S. CPU 10 (A[9…0]) 6(A[15…10]) TagData Valid

Cache Memory with Direct Mapping cont. For example, consider location of physical memory, which contains data This data can only be stored in one location in the cache. The location that has the same 10 low-order address bits as the original address, or However, any address of the form xxxx xx would map to this same cache location. This is the purpose of the tag field. In the previous picture, the tag value for this location is This means that the data stored at location is actually the data from physical memory location , which is Also, in the previous picture, we see a 1 in the valid section, if the bit was 0, none of this would be considered because the data in that location is not valid.

Cache Memory with Direct Mapping cont. Although direct-mapped cache is much less expensive than the associative cache, it is also much less flexible. In associative cache any word of physical memory can occupy any word of cache. However, in direct-mapped cache, each word of physical memory can be mapped to only one specific location. This is a problems for certain of programs. A good compiler will allocate the cod so this does not happen. However, it does illustrate a problem that can occur due to inflexibility of direct mapping. Set-associative mapping seeks to alleviate this problem while taking advantage of the strengths of direct-cache mapping method. This brings us to the next topic.

Cache Memory with Set-Associative Mapping Set-associative cache can makes use of relatively low-cost SRAM while trying to alleviate the problems of overwriting data inherent to direct mapping. This process is organized just like direct mapped cache except each address in cache can contain more than one data value. A cache in which each location can contain n bytes or words of data is called an n-way set- associative cache.

Set-associative mapping cont. Let consider the 1K, 2-way set-associative cache for the R.S. CPU. Each location contains two groups of fields, one for each way of the cache. The tag field is the same as in direct mapped cache except it’s 1 bit longer. Since the cache holds 1K data entries, and each location holds 2 data values, there are 512 locations total. The 9-bit address select the cache location and the remaining 7-bit specify the tag value. As before, the data field contains the data from the physical memory location. The count/valid field serves 2 purposes: –(1) One bit of this field is a valid bit, just like the cache mapping schemes. –(2) the count value used to keep track of when data was accessed. This information determines which piece of data will be replaced when a new value is loaded into the cache.

7(A[15…..9]) 9(A[8….0]) F From R.S. CPU Tag Data Count/valid Tag Data Count/valid Two-way set-associative cache for the R.S. CPU.

Replacing Data in the Cache As you know, when a computer is powered up, it performs several functions necessary to ensure its proper operation. Among those tasks, it must initialize its cache. Therefore, it set the valid bits to 0, much like asserting a register’s clear input. When the computer begins to execute a program, it fetches instructions and data from memory and load it into the cache. It works well if the cache is empty or sparsely populated. However, the computer will need to move data into cache locations that are already occupied. Then the problems is to decide which data to move out of the cache and how to preserve that data in physical memory. Direct mapping offers the easiest solution to this problem.

Replacing Data in the Cache cont. Since associative cache allows any location in physical memory to be mapped to any location in cache. It does not have to move data out of cache and back into physical memory unless it has no location without valid data. There are a number of replacement method that can be use to do this. Here are a few of the more popular ones that are used frequently: –FIFO (First In First Out) –LRU (Least Recently Used) –Random

Replacing Data in the Cache cont. FIFO (First In First Out): –This replacement process fills the associative memory from its top location to its bottom location. –When it copies data to its last location, the cache is full. –It then goes back to the top location, replacing its data with the next value to be stored. –This algorithm always replaces the data that was loaded into the cache first among all the data in the cache at that time. –This method requires nothing other than a register to hold a pointer to the next location to be replaced. –Its performance is generally good.

Replacing Data in the Cache cont. LRU (Least Recently Used): –The LRU method keeps track of the relative order in which each location is accessed and replaces the least recently used value with the new data. –This requires a counter for each location in cache and generally not used with associative caches. –However, it is used frequently with set-associative cache memory. Random: –The name said it all. –Random method selects a location to use for the new data. –In spite of the lack of logic to its selection of location, this replacement method produces good performance closed to that of the FIFO method.

Writing Data to the Cache To write data to the cache, we use two methods called write-through and write-back. Write-through: –In write-through, every time a value is written from the CPU into a location in the cache, it is also written into the corresponding location in physical memory. –This guarantees that physical memory always contains the correct value, but it requires additional time for the writes to physical memory. Write-back: –In write-back, the value written to the cache is not always written to physical memory. –The value is written to physical memory only once, when the data is removed from the cache. –This saves time used by write-through caches to copy their data to physical memory, but also introduces a time frame during which physical memory holds invalid data.

Writing Data to the Cache cont. Example: –Let consider a simple program loop: – for I = 1 to 1000 do – x = x + I; –During the loop, the CPU would write a value to x 1000 times. –If we use the write-back method, this loop would only write the result to physical memory one time instead of 1000 times if we were to used write-through method. –Therefore, write-back offers a significant time savings.

Writing Data to the Cache cont. However, performance is not the only consideration. Sometimes the currency of data also takes precedence. Another situation that must be addressed is how to write data to locations not currently loaded into the cache. This is called a write-miss. One possibility is to load the location into cache and then write the new value to cache using either write-back or write-through method. This is called write-allocate policy. Then there is the write-no allocate policy. This process updates the value in physical memory without loading it into the cache.

Cache Performance The primary reason for including cache memory in a computer is to improve system performance by reducing the time needed to access memory. The two primary components of cache performance are cache hits and cache misses. Cache hits: –Every time the CPU accesses memory, it checks the cache. –If the requested data is in the cache, the CPU accesses the data in the cache, rather than physical memory Cache misses: –If the requested data is not in the cache, the CPU accesses the data from main memory (and usually writes the data into the cache as well.)

Cache Performance cont. Hit ratio is the percentage of memory accesses that are served from the cache, rather than from physical memory. The higher the hit ratio, the more times the CPU accesses the relatively fast cache memory and the better the system performance. The average memory access time(Tm) is the weighted average of the cache access time, Tc, plus the access time for physical memory, Tp. The weighing factor is the hit ratio h. Therefore, Tm can be expressed as: –Tm = h Tc + (1 - h) Tp h TmTm 60 ns 55 ns 50 ns 45 ns 40 ns 35 ns 30 ns 25 ns 20 ns 15 ns 10 ns * This is the table for the hit ratios and average memory access times

Cache Performance cont. The rest of section 9.2 (pages ) show the different methods of cache activity using all those method that I’ve been discussing so far. It uses the average memory access time (Tm) equation to generate results (hit ratio and average memory access time (Tm)) for each different methods. If you want to take a look at those examples to see how they were process and generate those results, take a look at those pages I’ve mention above. This concluded my presentation. Thank you.

Any questions?