L18 – Memory Hierarchy 1 Comp 411 – Fall 2009 11/30/2009 Memory Hierarchy Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity.

Slides:



Advertisements
Similar presentations
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Advertisements

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
11/8/2005Comp 120 Fall November 9 classes to go! Read Section 7.5 especially important!
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
L19 – Memory Hierarchy 1 Comp 411 – Spring /8/2008 Memory Hierarchy Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity.
11/2/2004Comp 120 Fall November 9 classes to go! VOTE! 2 more needed for study. Assignment 10! Cache.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Chapter 7 Cache Memories.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
11/3/2005Comp 120 Fall November 10 classes to go! Cache.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Overview: Memory Memory Organization: General Issues (Hardware) –Objectives in Memory Design –Memory Types –Memory Hierarchies Memory Management (Software.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Systems I Locality and Caching
CMPE 421 Parallel Computer Architecture
CS1104: Computer Organisation School of Computing National University of Singapore.
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
Computer Organization and Design Memory Hierachy Montek Singh Wed, April 27, 2011 Lecture 17.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
1 Computer Architecture Cache Memory. 2 Today is brought to you by cache What do we want? –Fast access to data from memory –Large size of memory –Acceptable.
The Memory Hierarchy It says here the choices are “large and slow”, or “small and fast” Sounds like something that a little $ could fix.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Computer Organization and Design Memory Hierarchy Montek Singh Nov 30, 2015 Lecture 15.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
CS2100 Computer Organisation Cache I (AY2015/6) Semester 1.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CMSC 611: Advanced Computer Architecture
Yu-Lun Kuo Computer Sciences and Information Engineering
The Goal: illusion of large, fast, cheap memory
CS352H: Computer Systems Architecture
Computer Organization and Design Memory Hierarchy and Caches
Morgan Kaufmann Publishers Memory & Cache
Computer Organization and Design Memory Hierarchy
Computer Organization and Design Memory Hierarchy
November 14 6 classes to go! Read
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Memory Hierarchy Memory Flavors Principle of Locality Program Traces
10/16: Lecture Topics Memory problem Memory Solution: Caches Locality
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
November 5 No exam results today. 9 Classes to go!
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Memory Hierarchy and Caches
Cache Memory and Performance
Presentation transcript:

L18 – Memory Hierarchy 1 Comp 411 – Fall /30/2009 Memory Hierarchy Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity Read Ch

L18 – Memory Hierarchy 2 Comp 411 – Fall /30/2009 What Do We Want in a Memory? PC INST MADDR MDATA miniMIPSMEMORY CapacityLatencyCost Register1000’s of bits10 ps$$$$ SRAM1-4 Mbytes0.2 ns$$$ DRAM1-4 Gbytes5 ns$ Hard disk*100’s Gbytes10 ms ¢ Want? * non-volatile ADDR DOUT ADDR DATA R/WWr 2-10 Gbyte 0.2 ns cheap!

L18 – Memory Hierarchy 3 Comp 411 – Fall /30/2009 Quantity vs Quality… Your memory system can be BIG and SLOW... or SMALL and FAST DVD Burner (0.06$/G, 150ms) DISK (0.33$/GB, 10 mS) DRAM (100$/GB, 5 ns) SRAM (5000$/GB, 0.2 ns) Access Time.01 $/GB We’ve explored a range of device-design trade-offs. Is there an ARCHITECTURAL solution to this DILEMMA? 1

L18 – Memory Hierarchy 4 Comp 411 – Fall /30/2009 Best of Both Worlds What we REALLY want: A BIG, FAST memory! (Keep everything within instant access) We’d like to have a memory system that PERFORMS like 2 GBytes of SRAM; but COSTS like 512 MBytes of slow memory. SURPRISE: We can (nearly) get our wish! KEY: Use a hierarchy of memory technologies: CPU SRAM MAIN MEM

L18 – Memory Hierarchy 5 Comp 411 – Fall /30/2009 Key IDEA Keep the most often-used data in a small, fast SRAM (often local to CPU chip) Refer to Main Memory only rarely, for remaining data. The reason this strategy works: LOCALITY Locality of Reference: Reference to location X at time t implies that reference to location X+  X at time t+  t becomes more probable as  X and  t approach zero.

L18 – Memory Hierarchy 6 Comp 411 – Fall /30/2009 Cache cache (kash) n. A hiding place used especially for storing provisions. A place for concealment and safekeeping, as of valuables. The store of goods or valuables concealed in a hiding place. Computer Science. A fast storage buffer in the central processing unit of a computer. In this sense, also called cache memory. v. tr. cached, cach · ing, cach · es. To hide or store in a cache.

L18 – Memory Hierarchy 7 Comp 411 – Fall /30/2009 Cache Analogy You are writing a term paper at a table in the library As you work you realize you need a book You stop writing, fetch the reference, continue writing You don’t immediately return the book, maybe you’ll need it again Soon you have a few books at your table and no longer have to fetch more books The table is a CACHE for the rest of the library

L18 – Memory Hierarchy 8 Comp 411 – Fall /30/2009 Typical Memory Reference Patterns time address data stack program MEMORY TRACE – A temporal sequence of memory references (addresses) from a real program. TEMPORAL LOCALITY – If an item is referenced, it will tend to be referenced again soon SPATIAL LOCALITY – If an item is referenced, nearby items will tend to be referenced soon.

L18 – Memory Hierarchy 9 Comp 411 – Fall /30/2009 Working Set time address data stack program tt |S|  t S is the set of locations accessed during  t. Working set: a set S which changes slowly w.r.t. access time. Working set size, |S|

L18 – Memory Hierarchy 10 Comp 411 – Fall /30/2009 Exploiting the Memory Hierarchy Approach 1 (Cray, others): Expose Hierarchy Registers, Main Memory, Disk each available as storage alternatives; Tell programmers: “Use them cleverly” Approach 2: Hide Hierarchy Programming model: SINGLE kind of memory, single address space. Machine AUTOMATICALLY assigns locations to fast or slow memory, depending on usage patterns. CPU SRAM MAIN MEM CPU Small Static Dynamic RAM HARD DISK “MAIN MEMORY”

L18 – Memory Hierarchy 11 Comp 411 – Fall /30/2009 Why We Care CPU Small Static Dynamic RAM HARD DISK “MAIN MEMORY” TRICK #1: How to make slow MAIN MEMORY appear faster than it is. CPU performance is dominated by memory performance. More significant than: ISA, circuit optimization, pipelining, etc TRICK #2: How to make a small MAIN MEMORY appear bigger than it is. “VIRTUAL MEMORY” “SWAP SPACE” Technique: VIRTUAL MEMORY – Next lecture “CACHE” Technique: CACHING – This and next lecture

L18 – Memory Hierarchy 12 Comp 411 – Fall /30/2009 The Cache Idea: Program-Transparent Memory Hierarchy Cache contains TEMPORARY COPIES of selected main memory locations... eg. Mem[100] = 37 GOALS: 1) Improve the average access time 2) Transparency (compatibility, programming ease) 1.0 (1.0-  ) CPU "CACHE" DYNAMIC RAM "MAIN MEMORY"  (1-  ) HIT RATIO : Fraction of refs found in CACHE. MISS RATIO: Remaining references. Challenge: To make the hit ratio as high as possible.

L18 – Memory Hierarchy 13 Comp 411 – Fall /30/2009 How High of a Hit Ratio? Suppose we can easily build an on-chip static memory with a 0.8 nS access time, but the fastest dynamic memories that we can buy for main memory have an average access time of 10 nS. How high of a hit rate do we need to sustain an average access time of 1 nS? WOW, a cache really needs to be good?

L18 – Memory Hierarchy 14 Comp 411 – Fall /30/2009 Cache Sits between CPU and main memory Very fast table that stores a TAG and DATA TAG is the memory address DATA is a copy of memory at the address given by TAG Memory TagData

L18 – Memory Hierarchy 15 Comp 411 – Fall /30/2009 Cache Access On load we look in the TAG entries for the address we’re loading Found  a HIT, return the DATA Not Found  a MISS, go to memory for the data and put it and the address (TAG) in the cache Memory TagData

L18 – Memory Hierarchy 16 Comp 411 – Fall /30/2009 Cache Lines Usually get more data than requested (Why?) a LINE is the unit of memory stored in the cache usually much bigger than 1 word, 32 bytes per line is common bigger LINE means fewer misses because of spatial locality but bigger LINE means longer time on miss TagData Memory

L18 – Memory Hierarchy 17 Comp 411 – Fall /30/2009 Finding the TAG in the Cache A 1MByte cache may have 32k different lines each of 32 bytes We can’t afford to sequentially search the 32k different tags ASSOCIATIVE memory uses hardware to compare the address to the tags in parallel but it is expensive and 1MByte is thus unlikely TAG Data = ? TAG Data = ? TAG Data = ? Incoming Address HIT Data Out

L18 – Memory Hierarchy 18 Comp 411 – Fall /30/2009 Finding the TAG in the Cache A 1MByte cache may have 32k different lines each of 32 bytes We can’t afford to sequentially search the 32k different tags ASSOCIATIVE memory uses hardware to compare the address to the tags in parallel but it is expensive and 1MByte is thus unlikely DIRECT MAPPED CACHE computes the cache entry from the address multiple addresses map to the same cache line use TAG to determine if right Choose some bits from the address to determine the Cache line low 5 bits determine which byte within the line we need 15 bits to determine which of the 32k different lines has the data which of the 32 – 5 = 27 remaining bits should we use?

L18 – Memory Hierarchy 19 Comp 411 – Fall /30/2009 Direct-Mapping Example TagData Memory With 8 byte lines, the bottom 3 bits determine the byte within the line With 4 cache lines, the next 2 bits determine which line to use 1024d = b  line = 00b = 0d 1000d = b  line = 01b = 1d 1040d = b  line = 10b = 2d

L18 – Memory Hierarchy 20 Comp 411 – Fall /30/2009 Direct Mapping Miss TagData Memory What happens when we now ask for address 1008? 1008d = b  line = 10b = 2d but earlier we put 1040d there d = b  line = 10b = 2d

L18 – Memory Hierarchy 21 Comp 411 – Fall /30/2009 Miss Penalty and Rate The MISS PENALTY is the time it takes to read the memory if it isn’t in the cache 50 to 100 cycles is common. The MISS RATE is the fraction of accesses which MISS The HIT RATE is the fraction of accesses which HIT MISS RATE + HIT RATE = 1 Suppose a particular cache has a MISS PENALTY of 100 cycles and a HIT RATE of 95%. The CPI for load on HIT is 5 but on a MISS it is 105. What is the average CPI for load? Average CPI = 10 Suppose MISS PENALTY = 120 cycles? then CPI = 11 (slower memory doesn’t hurt much) 5 * * 0.05 = 10

L18 – Memory Hierarchy 22 Comp 411 – Fall /30/2009 What about store? What happens in the cache on a store? WRITE BACK CACHE  put it in the cache, write on replacement WRITE THROUGH CACHE  put in cache and in memory What happens on store and a MISS? WRITE BACK will fetch the line into cache WRITE THROUGH might just put it in memory

L18 – Memory Hierarchy 23 Comp 411 – Fall /30/2009 Cache Questions = Cash Questions What lies between Fully Associate and Direct-Mapped? When I put something new into the cache, what data gets thrown out? How many processor words should there be per tag? When I write to cache, should I also write to memory? What do I do when a write misses cache, should space in cache be allocated for the written address. What if I have INPUT/OUTPUT devices located at certain memory addresses, do we cache them?