Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 14: Memory Hierarchy Chapter 5 (4.

Slides:



Advertisements
Similar presentations
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
Memory Subsystem and Cache Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 20 - Memory.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Computer ArchitectureFall 2008 © October 27th, 2008 Majd F. Sakr CS-447– Computer Architecture.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Overview: Memory Memory Organization: General Issues (Hardware) –Objectives in Memory Design –Memory Types –Memory Hierarchies Memory Management (Software.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
4/6/2005 ECE Motivation for Memory Hierarchy What we want from memory  Fast  Large  Cheap There are different kinds of memory technologies  Register.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Systems I Locality and Caching
ECE Dept., University of Toronto
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
CS1104: Computer Organisation School of Computing National University of Singapore.
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.
1 CSCI 2510 Computer Organization Memory System I Organization.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Lecture 14 Memory Hierarchy and Cache Design Prof. Mike Schulte Computer Architecture ECE 201.
Lecture 19 Today’s topics Types of memory Memory hierarchy.
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Computer Organization & Programming
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
CACHE _View 9/30/ Memory Hierarchy To take advantage of locality principle, computer memory implemented as a memory hierarchy: multiple levels.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 26 Memory Hierarchy Design (Concept of Caching and Principle of Locality)
CMSC 611: Advanced Computer Architecture
COSC3330 Computer Architecture
Memory COMPUTER ARCHITECTURE
Yu-Lun Kuo Computer Sciences and Information Engineering
The Goal: illusion of large, fast, cheap memory
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
CMSC 611: Advanced Computer Architecture
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Cache Memory and Performance
Presentation transcript:

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 14: Memory Hierarchy Chapter 5 (4 th edition), 7 (3 rd edition)

ECE232: Memory Hierarchy 2 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Recap: Machine Organization Personal Computer Processor (CPU) (active) Computer Control (“brain”) Datapath Memory (passive) (where programs, & data live when running) Devices Input Output

ECE232: Memory Hierarchy 3 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Memory Basics  Users want large and fast memories!  Fact Large memories are slow Fast memories are small  Large memories use DRAM technology: Dynamic Random Access Memory High density, low power, cheap, slow Dynamic: needs to be “refreshed” regularly  DRAM access times are 50-70ns at cost of $10 to $20 per GB FPM (Fast Page Mode) ECC (Error Correcting Code) SDRAM (Synchronous Dynamic RAM) DDR (Data Transfer at both edges), source synchronous  Fast memories use SRAM: Static Random Access Memory Low density, high power, expensive, fast Static: content last “forever”(until lose power)  SRAM access times are.5 – 5ns at cost of $400 to $1,000 per GB

ECE232: Memory Hierarchy 4 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Memory Technology  SRAM and DRAM are: Random Access storage Access time is the same for all locations (Hardware decoder used)  For even larger and cheaper storage (than DRAM) use hard drive (Disk): Sequential Access Very slow, Data accessed sequentially, access time is location dependent, considered as I/O Disk access times are 5 to 20 million ns (i.e., msec) at cost of $.20 to $2.00 per GB

ECE232: Memory Hierarchy 5 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Memory Latency Problem µProc 60%/yr. (2X/1.5yr) DRAM 5%/yr. (2X/15 yrs) Processor-Memory Performance Gap: (grows 50% / year) Performance Time Processor-DRAM Memory Performance Gap Motivation for Memory Hierarchy

ECE232: Memory Hierarchy 6 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Need for speed  Assume CPU runs at 3GHz  Every instruction requires 4B of instruction and at least one memory access (4B of data) 3 * 8 = 24GB/sec  Peak performance of sequential burst of transfer (Performance for random access is much much slower due to latency)  Memory bandwidth and access time is a performance bottleneck InterfaceWidthFrequencyBytes/Sec 4-way interleaved PC1600 (DDR200) SDRAM4 x64bits100 MHz DDR6.4 GB/s Opteron Hyper- Transport memory bus128bits200 MHz DDR6.4 GB/s Pentium 4 "800 MHz" FSB64bits200 MHz QDR6.4 GB/s PC (DDR-II 800) SDRAM64bits400 MHz DDR6.4 GB/s PC (DDR-II 667) SDRAM64bits333 MHz DDR5.3 GB/s Pentium 4 "533 MHz" FSB64bits133 MHz QDR4.3 GB/s FSB – Front-Side Bus

ECE232: Memory Hierarchy 7 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Need for Large Memory  Small memories are fast  So just write small programs “640 K of memory should be enough for anybody” -- Bill Gates, 1981  Today’s programs require large memories Powerpoint 2003 – 25 megabytes Data base applications may require Gigabytes of memory

ECE232: Memory Hierarchy 8 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren The Goal: Illusion of large, fast, cheap memory  How do we create a memory that is large, cheap and fast (most of the time)?  Strategy: Provide a Small, Fast Memory which holds a subset of the main memory – called cache Keep frequently-accessed locations in fast cache Cache retrieves more than one word at a time Sequential accesses are faster after first access

ECE232: Memory Hierarchy 9 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Memory Hierarchy  Hierarchy of Levels Uses smaller and faster memory technologies close to the processor Fast access time in highest level of hierarchy Cheap, slow memory furthest from processor  The aim of memory hierarchy design is to have access time close to the highest level and size equal to the lowest level

ECE232: Memory Hierarchy 10 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Memory Hierarchy Pyramid Processor (CPU) Size of memory at each level Level 1 Level 2 Level n Increasing Distance from CPU, Decreasing cost / MB Level 3... transfer datapath: bus Decreasing distance from CPU, Decreasing Access Time (Memory Latency)

ECE232: Memory Hierarchy 11 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Basic Philosophy  Move data into ‘smaller, faster’ memory  Operate on it  Move it back to ‘larger, cheaper’ memory How do we keep track if changed  What if we run out of space in ‘smaller, faster’ memory?  Important Concepts: Latency, Bandwidth

ECE232: Memory Hierarchy 12 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Typical Hierarchy  Notice that the data width is changing Why?  Bandwidth: Transfer rate between various levels CPU-Cache: 24 GBps Cache-Main: GBps Main-Disk: 187MBps (serial ATA/1500) CPU regs CacheCache Memory disk 8 B32 B4 KB Cache/MMvirtual memory

ECE232: Memory Hierarchy 13 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Bandwidth Issue  Fetch large blocks at a time (Bandwidth) Supports spatial locality for (i=0; i < length; i++) sum += array[i]; array has spatial locality sum has temporal locality

ECE232: Memory Hierarchy 14 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Why Hierarchy works: Natural Locality  The Principle of Locality Programs access a relatively small portion of the address space at any second Memory Address 02 n - 1 Probability of reference  Temporal Locality (Locality in Time): Recently accessed data tend to be referenced again soon  Spatial Locality (Locality in Space): nearby items will tend to be referenced soon 1 0

ECE232: Memory Hierarchy 15 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Memory Hierarchy: Terminology  Hit: data appears in upper level in block X  Hit Rate: the fraction of memory accesses found in the upper level  Miss: data needs to be retrieved from a block in the lower level (Block Y)  Miss Rate = 1 - (Hit Rate)  Hit Time: Time to access the upper level which consists of Time to determine hit/miss + upper level access time  Miss Penalty: Time to replace a block in the upper level + Time to deliver the block to the processor  Note: Hit Time << Miss Penalty Lower Level Upper Level To Processor From Processor Block X Block Y

ECE232: Memory Hierarchy 16 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Current Memory Hierarchy Speed(ns):1ns2ns6ns100ns10,000,000ns Size (MB): ,000 Cost ($/MB):--$10$3$0.01 $0.002 Technology:RegsSRAMSRAMDRAMDisk Control Data- path Processor regs Secondary Memory L2 Cache L1 Cache Main Memory Cache - Main memory: Speed Main memory – Disk (virtual memory): Capacity

ECE232: Memory Hierarchy 17 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren How is the hierarchy managed?  Registers « Memory By the compiler (or assembly language Programmer)  Cache « Main Memory By hardware  Main Memory « Disks By combination of hardware and the operating system virtual memory Also by the programmer (in case of files) Control Data- path Processor regs Secondary Memory L2 Cache L1 Cache Main Memory

ECE232: Memory Hierarchy 18 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Memory Hierarchy Design Four Questions for Memory Hierarchy Q1: Where to place a block in upper level? (Block placement) Anywhere, in a single specific place, in one out of several specific places Q2: How to find a block in a the upper level? (Block identification) Q3: Which block should be replaced on a miss in upper level? (Block replacement) Replacement policy Q4: What happens on a write in upper level? (Write strategy)

ECE232: Memory Hierarchy 19 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Q4: What happens on a write?  Write policies Write–through—The information is written to both upper and lower level easier to implement the lower level has the most current copy of the data (Data consistency, coherence) Write–back—The information is only written to the upper level; the modified block is written to the lower level only when it is replaced uses less bandwidth, since multiple writes within a block only requires one write to lower level a read miss (which causes a block to be replaced and therefore) may result in writes to lower level  A block in a write–back upper level can be either clean or dirty, depending on whether the block content is the same as that in lower level