IT253: Computer Organization Lecture 11: Memory Tonga Institute of Higher Education.

Slides:



Advertisements
Similar presentations
IT253: Computer Organization
Advertisements

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Overview: Memory Memory Organization: General Issues (Hardware) –Objectives in Memory Design –Memory Types –Memory Hierarchies Memory Management (Software.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
11/10/2005Comp 120 Fall November 10 8 classes to go! questions to me –Topics you would like covered –Things you don’t understand –Suggestions.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
CMPE 421 Parallel Computer Architecture
IT253: Computer Organization Lecture 3: Memory and Bit Operations Tonga Institute of Higher Education.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Lecture 19 Today’s topics Types of memory Memory hierarchy.
IT253: Computer Organization
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
Introduction to computer architecture April 7th. Access to main memory –E.g. 1: individual memory accesses for j=0, j++, j
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
CMSC 611: Advanced Computer Architecture
The Goal: illusion of large, fast, cheap memory
How will execution time grow with SIZE?
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Virtual Memory 4 classes to go! Today: Virtual Memory.
Lecture 23: Cache, Memory, Virtual Memory
Lecture 08: Memory Hierarchy Cache Performance
Adapted from slides by Sally McKee Cornell University
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Cache Memory Rabi Mahapatra
Memory Principles.
Presentation transcript:

IT253: Computer Organization Lecture 11: Memory Tonga Institute of Higher Education

The Big Picture

What is Memory? (Review) A large, linear array of bytes. –Each byte has it’s own address in memory Most ISA’s have commands that do byte addressing (the addresses start every 8 bits) Data is aligned on the word boundary. –This means things like integers, characters, instructions are 32 bits long (1 word)

How we think of memory now When we built our processor we needed to pretend memory worked very simply so that we could get instructions and data from it

What do we really need for memory? We need four parts for our memory –The cache, which is the fastest memory, that the processor will use directly –The memory bus and I/O bus –Memory (or RAM) –Hard Disks

Part I: Inside the Processor The processor will use an internal cache (inside the processor) and an external cache that is nearby This is called a two- level cache If things can’t be saved in the cache, it goes to main memory

Part II: Main Memory Main memory is the RAM in the computer. It is often called DRAM (dynamic random access memory)

Memory Types Explained RAM – Random Access Memory –Random – access all the locations at the any time –DRAM – dynamic RAM High density, cheap, slow, low power usage Dynamic means it needs to be “refreshed”. This is the main memory –SRAM – static RAM Low Density, high power, expensive, fast Static – memory will last forever (until power cuts off) Caches are made out of this Non-Random Access Memory –Some memory technology is sequential (like a tape). You need to go through a lot of memory to find the spot you want.

RAM What's important to know about RAM? –Latency – time it takes for a word to be read from memory –Bandwidth – average words read per second If a programmer can fit his whole program in the size of the cache, it will be much faster. Every time the CPU goes to the RAM, it must wait a long time to get the data. We can make our programs faster if all the instructions stay inside cache

SRAM We can make a SRAM circuit (one that does not need to be refreshed) with 6 transistors. Then we can put together the SRAM to make a bigger SRAM This is a 16 word SRAM diagram. It can be accessed with 4 bits. 2^4 = 16 Each SRAM cell will hold 8 bits

The SRAM diagram Like everything else, we can draw one simple box to describe an SRAM WE_L – Write Enable OE_L – Output Enable We need Output Enable and Write enable because we are using the D bus to do both the input and the output. This is to save space inside the processor A is the address that we are either writing to or outputting to. The number of bits depends on how many words are inside the SRAM

DRAM What we know about DRAM –Needs to be refreshed regularly –Holds a lot of data in small space –Uses very little power –Has Output Enable –Has Write Enable

The 1-transistor DRAM memory To save a single bit, we need just 1 transistor To Write: –Select row, put bit on the bit line To Read: –Select row, read what comes on bit line. (Only very few electrons) –Then rewrite value, because the charge of electricity left during the read To Refresh: –Just do a read that will rewrite value

Simple DRAM grouping The DRAM cells are put together in an array, where it is possible to access one bit at a time

Complicated DRAM grouping The real way DRAM is put together is in layers. Usually, 8 layers will be put together and the row and column numbers will go to all the layers and will return 8 bits (1 byte) at a time Example: –2 MB DRAM = 256K x 8 layers –512 rows x 512 columns x 8 planes –512x512 = 256,000 (256K)

Diagram for RAM RAS_L = If this is 1 then A contains the row address CAS_L = If this is 1 then A contains column address WE_L = write enable OE_L = output enable D = the data that will be either inputted or outputted. (To save space, we use the same line for input and output)

DRAMs through History Fast Page DRAM – this type of DRAM allowed selecting memory through rows and columns and was able to automatically get the next byte, saving time. It was introduced in 1992 for PCs. Synchronous DRAM (SDRAM) – gives a clock signal to the RAM, so that it can "pipeline" data, meaning it can send more than one piece of data at a time. Introduced in 1997 and is very common Dual Data Rate RAM (DDR-RAM) – can transfer data two times during a clock cycle. Introduced in 2000 and is used in all new computers Rambus DRAM (RDRAM) – Uses a special method of signalling that allows for faster clock speeds, but is made only by the Rambus company. Introduced in 2001, it was popular for a short time, before Intel refused to support it

Summary of DRAM and SRAM DRAM –Slow, cheap, low power. –Good for giving user a lot of memory at a low price –Uses 1 transistor to save one bit SRAM –Fast, expensive, uses power –Good for people who need speed –Uses 6 transistors to save one bit

Caches Why do we want a cache? If DRAM is slow and SRAM is fast, then we can make the average access time to memory very small if most of the accesses are in SRAM We can use SRAM to make a memory that works very quickly (the cache)

Different Levels of Memory THE MEMORY HIERARCHY

Cache Ideas: Locality Locality – the idea that most of the things you need are close by to you 90 percent of the time, you will be using 10 percent of the code Two types of locality: –Temporal – The locality of time – if something is used, it will be used again in the near future –Spatial – The locality of space – if something is used, then things that are near it will probably be used as well

How the levels work together The levels of memory are always working together to keep moving memory closer to the fastest level (the cache). The levels copy data between themselves Block – a block is the smallest piece of data that will be copied between levels

The Memory Hierarchy Hit – the data that is wanted is in the memory level we are searching –(example in picture is Block X) –Hit Rate – fraction of time that we find the data we want in the memory level –Hit Time – the time it takes to get a piece of data from the higher level into processor Miss – data is not in the higher level. The data needs to come from the lower level –Miss Rate = 1 – Hit Rate –Miss Penalty = the time it takes to load data from lower level into higher level and send to processor

A simple cache: Direct Mapped The first spot in a cache index will be from the beginning of a word. The next 4 cache indexes will automatically be the next 4 bytes from the main memory. Thus we are using 1-byte blocks in the cache index

Direct Mapped Cache A direct mapped cache – a cache of fixed size blocks. Each block holds data from main memory Parts of a direct mapped cache –Data – the actual data –Tag – special number for each block –Index – spot in the cache that holds the data Parts of a direct mapped cache address –Tag Array – list of tags that identify what's in the cache A Tag will tell us if the data we are looking for is in the cache Each cache entry will have a special, unique tag. If that tag is not in the cache, then we know that it is a miss and we need to get it from main memory –Cache Index – the location of a block in the cache –Block Offset – byte location in the cache block

Direct Mapped Caches The processor will use addresses that link into the cache. The address will have special parts, just like instruction formatting. With the different pieces of the address we can figure out where to find the data in the cache If the cache is 2 M bytes (in size) and the block size is 2 L, then there are 2 (M-L) blocks –If we use 32-bit addresses then: –Lowest L bits are for block offset –Next (M-L) bits are for Cache-Index –The last (32-M) bits are for Tag bits (tag holds address of data in cache)

Direct Mapped Cache Example Example: 1 KB cache with 32 byte blocks –Cache-Index = (Address % 1024) / 32 –Block Offset = Address % 32 –Tag = Address / 1024 (tag holds address of data in cache) –Valid Bit – says if the data in the cache is good, or if its bad 32 cache blocks * 32 byte blocks = 1024 bytes = 1 KB cache

Direct Mapped Cache Example Cache tag will check to see if the cache entry is actually In the cache or if it is not. If it is not, we get it from RAM

Direct Mapped Cache Example Example of a Cache Miss

Direct Mapped Cache Example A Cache Hit

The Block Size Decision The goal is to find the right block size so that you will get mostly cache hits. But also, if you miss, the penalty will not be that bad Larger block size – better spatial locality –But takes longer to put a new one into cache –If block size is too big, there are too few blocks in the cache and you will get many misses again

A Better Cache: Associative Cache An N-Way Set Associative Cache works differently from the direct mapped cache. In the N-Way Set, there are N entries for each cache index, so it is like N direct mapped caches at the same time All the entries in one set are selected and then only the one with the correct Cache Tag is chosen

Pros and Cons: Set Associative Cache The set associative cache gives us many benefits –Higher hit rate for same size cache –Fewer conflict misses –Can have a larger cache, but not change the number of bits used for cache index But there are also bad things –You need to compare N things to choose which is the right piece of data (so we get a time delay for a MUX) –The data is only available to use after we decide if it’s a hit or a miss (With direct mapped, we can assume it’s a hit and if it’s not, then fix the mistake)

Cache Questions Draw a 32 KB cache with 4 byte blocks that is 2 way set associative If you have a 256 byte direct mapped cache with 16 byte blocks, and you have the following tags in your tag array, choose which address will result in a hit in the cache: Tag array: Index 0 = 0xEF4021, Index 1 = 0xEF4022, Index 2 = 0x430322, Index 3 = 0x320933, Index 4 = 0xA34E44 1.0x x xEF xA34E x

Sources for Cache Misses What can cause a cache miss? –Compulsory: When you start a computer, all the data in the cache is no good (also called ‘Cold Start’). Nothing we can do about it –Conflict: Multiple memory locations mapped to same cache spot You can increase cache size, or increase associativity –Capacity: Cache cannot contain all blocks needed by a program. Increase cache size –Invalidation: Something else changes the data (like some sort of input)

A Simple Chart for Cache misses

Replacing Blocks in Cache We need a way to decide how to replace blocks in cache. –For a direct mapped cache, there is no policy, because we just throw away the block that is in it’s place For a N-Way Set Associative cache, we have N blocks to choose from to throw away, because we’ll need to make room for the new block This is called the Cache Block Replacement Policy

Cache Block Replacement Policy Random Replacement - hardware randomly selects a block to throw out First in, First Out (FIFO) – Hardware keeps a list of what came into the cache in what order. It will then throw out what came first Least Recently Used (LRU) – Hardware keeps track of when each block was used. The one that has not been used for the longest is deleted

Cache Write Policy There are a few ways we can write data to the cache as well Our problem is that we need to keep data in the memory and the cache the same Two options to do this: –Write Back: store data only in cache. When cache block is replaced, move back to memory. Only one copy. We must use special controls to make sure we don't make mistakes –Write Through: Write to memory and to cache at the same time. We use a small buffer that will save copies of things before they get written to main memory, because it may take longer to write to main memory than it does to the cache.

Questions for the memory hierarchy Designers of memory systems need to know the answers to these questions before they start building 1.Where is a block placed in the upper level of memory? –(Block Placement) 2.How is a block found if it’s in the upper level? –(Block Identification) 3.Which block should be replaced on a miss? –(Block Replacement) 4.What happens on a write? –(Write Strategy)

Cache Performance CPU time = (CPU execution clock cycles + Memory Stall clock cycles) x Clock cycle time Memory Stall clock cycles = Memory accesses x Miss Rate x Miss Penalty We can figure out how well our cache will work with formulas like these –Example: If 1 instruction takes one clock cycle Miss penalty = 20 cycles Miss rate = 10% And there are a 1000 instructions and 300 memory accesses) –Then Memory Stall clock cycles = (300 *.10 * 20) = 600 cycles CPU time = ( ) * 1 = 1,600 cycles to do 1,000 instruction This means we are spending 37.5% of our time on memory access!!!!

How to improve cache performance Reduce miss rate –Remember 4 reasons for miss Compulsory (at first, there is no memory in cache, all bad) Capacity (can’t fit everything inside of the cache) Conflict (the stuff in the cache is not the right stuff we want) Invalidation (nothing we can do about this) Reduce miss penalty Reduce time for a hit in the cache So can we improve cache performance with our programming?? Yes!

Ways to improve Cache performance with programming With instructions –Loop interchange – change nesting of loops to access data in ways that will use the cache wisely –Combining Loops – Combine two loops that have much of same data and some of the same variables With data in memory –Merging arrays – putting arrays together. Use 1 array of an object that can hold two types of data instead of two arrays, each holding a different type of data –Pointers – Use pointers to access memory. They are not big blocks that need to be copied in and out of cache

Loop Interchange Example

Loop Combining Example

Merging Arrays Example

Changing code A lot of the time, the compiler will change your code into a more optimized version using these examples. It will try hard to make sure cache misses do not happen often. The compiler will reorder some instructions and look at memory for possible conflicts and try to fix them

Summary The chapter about memory covers a great deal. From the way it is built to the way that it works There are different levels of memory that work together The cache is the fastest and most important memory, so we have special rules about how to make it work We can affect memory speed ourselves through better coding