1  2004 Morgan Kaufmann Publishers Chapter Seven.

Slides:



Advertisements
Similar presentations
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Advertisements

Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
11/8/2005Comp 120 Fall November 9 classes to go! Read Section 7.5 especially important!
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
Caching IV Andreas Klappenecker CPSC321 Computer Architecture.
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
The Memory Hierarchy CPSC 321 Andreas Klappenecker.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Chapter 7 Large and Fast: Exploiting Memory Hierarchy Bo Cheng.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
1  2004 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
Processor Design 5Z032 Memory Hierarchy Chapter 7 Henk Corporaal Eindhoven University of Technology 2009.
11/10/2005Comp 120 Fall November 10 8 classes to go! questions to me –Topics you would like covered –Things you don’t understand –Suggestions.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Computing Systems Memory Hierarchy.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Patterson.
Lecture 19: Virtual Memory
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small.
Memory and cache CPU Memory I/O. CEG 320/52010: Memory and cache2 The Memory Hierarchy Registers Primary cache Secondary cache Main memory Magnetic disk.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
1  2004 Morgan Kaufmann Publishers Lecture 11, Oct. 29, 2007 Syllabus change: –Tues, Oct. 29, 2007 – Caching –Thurs, Nov. 1, 2007 – Review –Tues, Nov.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
Improving Memory Access 2/3 The Cache and Virtual Memory
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Summary of caches: The Principle of Locality: –Program likely to access a relatively small portion of the address space at any instant of time. Temporal.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
1 ECE3055 Computer Architecture and Operating Systems Lecture 8 Memory Subsystem Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering.
COSC3330 Computer Architecture
Memory COMPUTER ARCHITECTURE
The Goal: illusion of large, fast, cheap memory
Chapter Seven.
CS352H: Computer Systems Architecture
Improving Memory Access 1/3 The Cache and Virtual Memory
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Morgan Kaufmann Publishers Memory & Cache
Morgan Kaufmann Publishers
Virtual Memory 4 classes to go! Today: Virtual Memory.
ECE 445 – Computer Organization
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Morgan Kaufmann Publishers Memory Hierarchy: Introduction
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Memory & Cache.
Presentation transcript:

1  2004 Morgan Kaufmann Publishers Chapter Seven

2  2004 Morgan Kaufmann Publishers SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge on capacitor (must be refreshed) –very small but slower than SRAM (factor of 5 to 10) Memories: Review

3  2004 Morgan Kaufmann Publishers Users want large and fast memories! SRAM access times are.5 – 5ns at cost of $4000 to $10,000 per GB. DRAM access times are 50-70ns at cost of $100 to $200 per GB. Disk access times are 5 to 20 million ns at cost of $.50 to $2 per GB. Try and give it to them anyway –build a memory hierarchy Exploiting Memory Hierarchy 2004

4  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be referenced soon. Why does code have locality? Our initial focus: two levels (upper, lower) –block: minimum unit of data –hit: data requested is in the upper level –miss: data requested is not in the upper level

5  2004 Morgan Kaufmann Publishers Two issues: –How do we know if a data item is in the cache? –If it is, how do we find it? Our first example: – block size is one word of data – "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e.g., lots of items at the lower level share locations in the upper level Cache

6  2004 Morgan Kaufmann Publishers Mapping: address is modulo the number of blocks in the cache Direct Mapped Cache

7  2004 Morgan Kaufmann Publishers For MIPS: What kind of locality are we taking advantage of? Direct Mapped Cache

8  2004 Morgan Kaufmann Publishers Accessing a Cache: Direct Mapped Cache Decimal address of reference Binary address of reference Hit or miss in cache Assigned cache block (where found or placed) miss10110 two mod 8 =110 two miss11010 two mod 8 =010 two hit10110 two mod 8 =110 two hit11010 two mod 8 =010 two miss10000 two mod 8 =000 two miss00110 two mod 8 =110 two hit10000 two mod 8 =000 two miss10010 two mod 8 =010 two IndexVTagData 000 N 001 N 010 N 011 N 100N 101N 110 N 111N 10 two 11 two Memory(10110 two ) Memory(11010 two ) Memory(10000 two )10 two 00 two Memory(00011 two ) 10 two Memory(10010 two )  Y Y  Y Y  Y Y  Y Y

9  2004 Morgan Kaufmann Publishers Taking advantage of spatial locality (the Intrinsity FastMATH processor): Direct Mapped Cache

10  2004 Morgan Kaufmann Publishers Read hits –this is what we want! Read misses –stall the CPU, fetch block from memory, deliver to cache, restart Write hits: –can replace data in cache and memory (write-through) –write the data only into the cache (write-back the cache later) Write misses: –read the entire block into the cache, then write the word Hits vs. Misses

11  2004 Morgan Kaufmann Publishers Make reading multiple words easier by using banks of memory It can get a lot more complicated... Hardware Issues

12  2004 Morgan Kaufmann Publishers Increasing the block size tends to decrease miss rate: Use split caches because there is more spatial locality in code: Performance

13  2004 Morgan Kaufmann Publishers Performance Simplified model: execution time = (execution cycles + stall cycles)  cycle time stall cycles = # of instructions  miss ratio  miss penalty Two ways of improving performance: –decreasing the miss ratio –decreasing the miss penalty What happens if we increase block size?

14  2004 Morgan Kaufmann Publishers Decreasing miss ratio with associativity

15  2004 Morgan Kaufmann Publishers Compared to direct mapped, give a series of references that: –results in a lower miss ratio using a 2-way set associative cache –results in a higher miss ratio using a 2-way set associative cache assuming we use the “least recently used” replacement strategy Decreasing miss ratio with associativity

16  2004 Morgan Kaufmann Publishers An implementation

17  2004 Morgan Kaufmann Publishers Performance

18  2004 Morgan Kaufmann Publishers Decreasing miss penalty with multilevel caches Add a second level cache: –often primary cache is on the same chip as the processor –use SRAMs to add another cache above primary memory (DRAM) –miss penalty goes down if data is in 2nd level cache Example: –CPI of 1.0 on a 5 Ghz machine with a 5% miss rate, 100ns DRAM access –Adding 2nd level cache with 5ns access time decreases miss rate to.5% Using multilevel caches: –try and optimize the hit time on the 1st level cache –try and optimize the miss rate on the 2nd level cache

19  2004 Morgan Kaufmann Publishers Cache Complexities Not always easy to understand implications of caches: Theoretical behavior of Radix sort vs. Quicksort Observed behavior of Radix sort vs. Quicksort

20  2004 Morgan Kaufmann Publishers Cache Complexities Here is why: Memory system performance is often critical factor –multilevel caches, pipelined processors, make it harder to predict outcomes –Compiler optimizations to increase locality sometimes hurt ILP Difficult to predict best algorithm: need experimental data

21  2004 Morgan Kaufmann Publishers Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation –Sharing of both the physical memory and CPU among multiple programs –Protection

22  2004 Morgan Kaufmann Publishers Pages: virtual memory blocks Page faults: the data is not in memory, retrieve it from disk –huge miss penalty, thus pages should be fairly large (e.g., 4KB) –reducing page faults is important (LRU is worth the price) –can handle the faults in software instead of hardware –using write-through is too expensive so we use writeback

23  2004 Morgan Kaufmann Publishers Page Tables

24  2004 Morgan Kaufmann Publishers Page Tables

25  2004 Morgan Kaufmann Publishers Making Address Translation Fast A cache for address translations: translation lookaside buffer Typical values: entries, miss-rate:.01% - 1% miss-penalty: 10 – 100 cycles

26  2004 Morgan Kaufmann Publishers TLBs and caches

27  2004 Morgan Kaufmann Publishers TLBs and Caches

28  2004 Morgan Kaufmann Publishers Modern Systems

29  2004 Morgan Kaufmann Publishers Modern Systems Things are getting complicated!

30  2004 Morgan Kaufmann Publishers Processor speeds continue to increase very fast — much faster than either DRAM or disk access times Design challenge: dealing with this growing disparity –Prefetching? 3 rd level caches and more? Memory design? Some Issues