Lecture 19 Today’s topics Types of memory Memory hierarchy.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Memory COE 308 Computer Architecture Prof. Muhamed Mudawar
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
The Memory Hierarchy CPSC 321 Andreas Klappenecker.
Memory ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering King Fahd University of Petroleum.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 20 - Memory.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Computer ArchitectureFall 2008 © October 27th, 2008 Majd F. Sakr CS-447– Computer Architecture.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Computing Systems Memory Hierarchy.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Systems I Locality and Caching
Computer Orgnization Rabie A. Ramadan Lecture 7. Wired Control Unit What are the states of the following design:
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
CS1104: Computer Organisation School of Computing National University of Singapore.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Lecture 14 Memory Hierarchy and Cache Design Prof. Mike Schulte Computer Architecture ECE 201.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Computer Organization & Programming
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
Cosc 2150: Computer Organization
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 14: Memory Hierarchy Chapter 5 (4.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Computer Organization CS224 Fall 2012 Lessons 37 & 38.
Memory COE 301 / ICS 233 Computer Organization Prof. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and.
Introduction to computer architecture April 7th. Access to main memory –E.g. 1: individual memory accesses for j=0, j++, j
CACHE _View 9/30/ Memory Hierarchy To take advantage of locality principle, computer memory implemented as a memory hierarchy: multiple levels.
CMSC 611: Advanced Computer Architecture
COSC3330 Computer Architecture
The Goal: illusion of large, fast, cheap memory
How will execution time grow with SIZE?
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
CPE 631 Lecture 05: Cache Design
CMSC 611: Advanced Computer Architecture
Morgan Kaufmann Publishers Memory Hierarchy: Introduction
Memory ICS 233 Computer Architecture and Assembly Language
Dr. Hadi AL Saadi Faculty Of Information Technology
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache Memory and Performance
Memory Hierarchy and Caches
Presentation transcript:

Lecture 19 Today’s topics Types of memory Memory hierarchy

6.1 Introduction Memory organization A clear understanding of these ideas is essential for the analysis of system performance.

Types of Memory There are two kinds of main memory: Random access memory (RAM) Dynamic RAM (DRAM) Static RAM (SRAM) Read-only-memory (ROM) ROM does not need to be refreshed. ROM is used to store permanent, or semi-permanent data that persists even while the system is turned off.

Random Access Memory   RAM Address Data OE WE n m

The University of Adelaide, School of Computer Science SRAM vs. DRAM 23 April 2017 Static RAM (SRAM) for Cache Requires 6-8 transistors per bit Requires low power to retain bit Dynamic RAM (DRAM) for Main Memory One transistor + capacitor per bit Must be re-written after being read Must also be periodically refreshed Each row can be refreshed simultaneously Address lines are multiplexed Upper half of address: Row Access Strobe (RAS) Lower half of address: Column Access Strobe (CAS) Chapter 2 — Instructions: Language of the Computer

Static RAM Storage Cell Static RAM (SRAM): fast but expensive RAM 6-Transistor cell with no static current Typically used for caches Provides fast access time Cell Implementation: Cross-coupled inverters store bit Two pass transistors enable the cell to be read and written Typical SRAM cell Vcc Word line bit

Dynamic RAM Storage Cell Dynamic RAM (DRAM): slow, cheap, and dense memory Typical choice for main memory Cell Implementation: 1-Transistor cell (pass transistor) Trench capacitor (stores bit) Bit is stored as a charge on capacitor Must be refreshed periodically Because of leakage of charge from tiny capacitor Refreshing for all memory rows Reading each row and writing it back to restore the charge Typical DRAM cell Word line bit Capacitor Pass Transistor Let’s summarize today’s lecture. The first thing we covered is the principle of locality. There are two types of locality: temporal, or locality of time and spatial, locality of space. We talked about memory system design. The key idea of memory system design is to present the user with as much memory as possible in the cheapest technology while by taking advantage of the principle of locality, create an illusion that the average access time is close to that of the fastest technology. As far as Random Access technology is concerned, we concentrate on 2: DRAM and SRAM. DRAM is slow but cheap and dense so is a good choice for presenting the use with a BIG memory system. SRAM, on the other hand, is fast but it is also expensive both in terms of cost and power, so it is a good choice for providing the user with a fast access time. I have already showed you how DRAMs are used to construct the main memory for the SPARCstation 20. On Friday, we will talk about caches. +2 = 78 min. (Y:58)

DRAM Refresh Cycles Refresh cycle is about tens of milliseconds Refreshing is done for the entire memory Each row is read and written back to restore the charge Some of the memory bandwidth is lost to refresh cycles Time Threshold voltage 0 Stored 1 Written Refreshed Refresh Cycle Voltage for 1 for 0

Processor-Memory Performance Gap The University of Adelaide, School of Computer Science 23 April 2017 Processor-Memory Performance Gap CPU Performance: 55% per year, slowing down after 2004 Performance Gap DRAM: 7% per year 1980 – No cache in microprocessor 1995 – Two-level cache on microprocessor Chapter 2 — Instructions: Language of the Computer

The Need for Cache Memory Widening speed gap between CPU and main memory Processor operation takes less than 1 ns Main memory requires more than 50 ns to access Each instruction involves at least one memory access One memory access to fetch the instruction A second memory access for load and store instructions Memory bandwidth limits the instruction execution rate Cache memory can help bridge the CPU-memory gap Cache memory is small in size but fast

The Memory Hierarchy Faster memory is more expensive than slower memory. To provide the best performance at the lowest cost, memory is organized in a hierarchical fashion. Small, fast storage elements are kept in the CPU, larger, slower main memory is accessed through the data bus. Larger, (almost) permanent storage in the form of disk and tape drives is still further from the CPU.

Typical Memory Hierarchy Registers are at the top of the hierarchy Typical size < 1 KB Access time < 0.5 ns Level 1 Cache (8 – 64 KB) Access time: 1 ns L2 Cache (512KB – 8MB) Access time: 3 – 10 ns Main Memory (4 – 16 GB) Access time: 50 – 100 ns Disk Storage (> 200 GB) Access time: 5 – 10 ms Microprocessor Registers L1 Cache L2 Cache Main Memory Magnetic or Flash Disk Memory Bus I/O Bus Faster Bigger

6.3 The Memory Hierarchy Data are processed at CPU. To access a particular piece of data, the CPU first sends a request to its nearest memory, usually cache. If the data is not in cache, then main memory is queried. If the data is not in main memory, then the request goes to disk. Once the data is located, then the data, and a number of its nearby data elements are fetched into cache memory.

Principle of Locality of Reference Programs access small portion of their address space At any time, only a small set of instructions & data is needed Temporal Locality (in time) If an item is accessed, probably it will be accessed again soon Same loop instructions are fetched each iteration Same procedure may be called and executed many times Spatial Locality (in space) Tendency to access contiguous instructions/data in memory Sequential execution of Instructions Traversing arrays element by element

What is a Cache Memory ? Small and fast (SRAM) memory technology Stores the subset of instructions & data currently being accessed Used to reduce average access time to memory Caches exploit temporal locality by … Keeping recently accessed data closer to the processor Caches exploit spatial locality by … Moving blocks consisting of multiple contiguous words Goal is to achieve Fast speed of cache memory access Balance the cost of the memory system

Cache Memories in the Datapath Imm16 Imm E ALU result 32 2 3 1 I-Cache Instruction A 32 A L U Register File RA RB BusA BusB RW BusW D-Cache Address Data_in Data_out ALUout WB Data Rd4 Rs 5 1 Instruction Rt 5 2 3 1 32 B PC Address 1 D 32 1 32 Rd2 Rd3 Rd clk I-Cache miss or D-Cache miss causes pipeline to stall Instruction Block Block Address I-Cache miss Block Address D-Cache miss Data Block Interface to L2 Cache or Main Memory

Almost Everything is a Cache ! In computer architecture, almost everything is a cache! Registers: a cache on variables – software managed First-level cache: a cache on second-level cache Second-level cache: a cache on memory Memory: a cache on hard disk Stores recent programs and their data Hard disk can be viewed as an extension to main memory Branch target and prediction buffer Cache on branch target and prediction information

Definitions This leads us to some definitions. A hit is when data is found at a given memory level. A miss is when it is not found. The hit rate is the percentage of time data is found at a given memory level. The miss rate is the percentage of time it is not. Miss rate = 1 - hit rate. The hit time is the time required to access data at a given memory level. The miss penalty is the time required to process a miss, including the time that it takes to replace a block of memory plus the time it takes to deliver the data to the processor.

Four Basic Questions on Caches Q1: Where can a block be placed in a cache? Block placement Direct Mapped, Set Associative, Fully Associative Q2: How is a block found in a cache? Block identification Block address, tag, index Q3: Which block should be replaced on a miss? Block replacement FIFO, Random, LRU Q4: What happens on a write? Write strategy Write Back or Write Through (with Write Buffer)