Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.

Slides:



Advertisements
Similar presentations
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Advertisements

1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
The Memory Hierarchy CPSC 321 Andreas Klappenecker.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Chapter 7 Large and Fast: Exploiting Memory Hierarchy Bo Cheng.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Lecture 19: Virtual Memory
Memory/Storage Architecture Lab Computer Architecture Memory Hierarchy.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
Virtual Memory. DRAM as cache What about programs larger than DRAM? When we run multiple programs, all must fit in DRAM! Add another larger, slower level.
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Computer Organization & Programming
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Summary of caches: The Principle of Locality: –Program likely to access a relatively small portion of the address space at any instant of time. Temporal.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
CS161 – Design and Architecture of Computer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Virtual Memory Chapter 7.4.
ECE232: Hardware Organization and Design
Yu-Lun Kuo Computer Sciences and Information Engineering
CS161 – Design and Architecture of Computer
Improving Memory Access 1/3 The Cache and Virtual Memory
CSC 4250 Computer Architectures
Morgan Kaufmann Publishers Memory & Cache
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Memory & Cache.
Presentation transcript:

Memory Hierarchy How to improve memory access

Outline Locality Structure of memory hierarchy Cache Virtual memory

Locality Principle of locality –Programs access a relatively small portion of their address space at any instant of time. Temporal locality –If an item is referenced, it tends to be referenced again soon. Spatial locality –If an item is referenced, items near by tends to be referenced soon.

Memory Hierarchy Multiple levels of memory with different speeds and sizes. Give users the perception that the memory is as large as the largest and as fast as the fastest. The unit of memory considered in memory hierarchy is a block. CPU registers Memory SRAM Memory DRAM Memory Magnetic disk

Structure of memory hierarchy Memory SRAM Memory DRAM Memory Magnetic disk size CPU registers speed Cost per bit

Structure of memory hierarchy Memory typeAccess timeCost per bit Registers~ 0.2 ns SRAM: Static RAM0.5 – 5 ns$4,000-$10,000 DRAM: Dynamic RAM50 – 70 ns$100 - $200 Magnetic Disk ms$0.5 - $2

Cache A level of memory hierarchy between CPU and main memory. registers cache Memory Disk Every thing you need is in a register. Everything you need is in cache. Everything you need is in memory.

How to improve memory access time registers A B C D a b c d e f g h CPU Cache Memory Disk AC BD AB CD abe dc f hg

Address Space Suppose 1 block = 256 byte = 2 8 byte cache has 8 blocks memory has 32 blocks disk has 64 blocks. Then, cache has 8  2 8 = 2 11 bytes memory has 32  2 8 = 2 13 bytes disk has 64  2 8 = 2 14 bytes. For the cache, a block number has 3 bits, and an address has 11 bits. For the memory, a block number has 5 bits, and an address has 13 bits. For the disk, a block number has 6 bits, and an address has 14 bits. cachememory disk address data 8 88

Address Space …… Cache: 8 blocks Memory: 32 blocks Disk: 64 blocks Address: Block number || offset in block Address in cache : xxx || xxxxxxxxAddress in disk : xxxxxx || xxxxxxxxAddress in memory: xxxxx || xxxxxxxx

Hit / Miss Hit The requested data is found in the upper level of the hierarchy. Hit rate or hit ratio The fraction of memory access found in the upper level Hit time The time to access data when it hits (= time to check if the data is in the upper level + access time) Miss The requested data is not found the upper level, but is in the lower level, of the hierarchy. Miss rate or miss ratio 1 – hit rate Miss penalty The time to get a block of data into the upper level, and then into the CPU.

Cache A level of memory hierarchy between CPU and main memory. To access data in memory hierarchy –CPU requests data from cache. –Check if data is in the cache. Cache hit –Transfer the requested data from cache to CPU Cache miss –Transfer a block containing the requested data from memory to cache –Transfer the requested data from cache to CPU

How cache works ABCDEF cache memory A CPU Request A BCDEF miss Request BRequest CRequest D hit Request ERequest F Cache is full; Replace a block.

Where to place a block in cache Direct-mapped cache Each memory location is mapped to exactly one location in the cache. (But one cache location can be mapped to different memory location at different time.) Other mapping can be used. c0c1c2c3 b0b1b2b3 b4b5b6b7 b8b9b10… … Cache-memory mapping

Direct-mapped cache Cache Memory 1 block = 4 byte

Fully-associative cache Cache Memory 1 block = 4 byte

Set-associative cache Cache Memory 1 block = 4 byte

Determine if a block is in the cache For each block in the cache –Valid bit indicate that the block contains valid data –Tag Contain the information of the associated block in the memory Example: –If the valid bit is false, no block from memory is stored in that block of cache. –If the valid bit is true, the address of data stored in the block is stored in tag.

Example: direct-mapped Valid bit Tag Cache Memory

Example: Fully-associative Valid bit Tag Cache Memory

Example: set-associative cache Valid bit Tag Cache Memory

Access a direct-mapped cache Cache indexValid bittag ……… Memory address memory = AND hit Cache address

Access a fully-associative cache Cache indexValid bittag ……… Memory address AND Cache address

Access a set-associative cache Cache indexValid bittagValid bittag …………… Memory address AND Cache address AND Cache address

Access a set-associative cache Cache indexValid bittagValid bittag …………… Memory address Cache address == AND hit1 hit0

Block size vs. Miss rate

Handling Cache Misses If an instruction is not in a cache, we have to wait for the memory to respond and write data into the cache. (multiple cycles) Cause processor stall. Steps to handle –Send PC-4 to memory –Read from memory to cache and wait for the result –Update cache information (tag + valid bit) –Restart instruction execution.

Handling Writes Write-through When data is written, both the cache and the memory are updated. Consistent copies of memory. Slow because writing to memory is slower. Improve by: –using a write buffer, storing data waiting to be written to memory –Then, processor can continue execution. Write-back When data is written, only the cache is updated. Memory is inconsistent with cache. Faster. But, once a block is removed from cache, it must be written back to memory.

Performance Improvement Increase hit rate/reduce miss rate –Increase cache size –Block size –Good cache associativity –Good replacement policy Reduce cache access time –Multilevel cache

CPU Multilevel Cache Memory L1 cache L2 cache

ProcessorL1 cacheL2 cache Pentium16 KB Pentium Pro16KB256/512 KB Pentium MMX32 KB Pentium II and III32 KB Celeron32 KB128 KB Pentium III Cumine32 KB256 KB AMD K6 and K6-264 KB AMD K6-364 KB256 KB AMD K7 Athlon128 KB AMD Duron128 KB64 KB AMD Athlon Thunderbird128 KB256 KB

Virtual Memory Similar to cache –Based on principle of locality –Memory is divided into equal blocks called page. –If a requested page is not found in the memory, page fault occurs. Allow efficient and safe sharing of memory among multiple programs –Each program has its own address space Virtually extends the memory size –A program can be larger than the memory.

Program A Virtual Memory Program B Program C Main memory Physical address Virtual memory Virtual address Address translation disk swap space

Program A Virtual Memory Main memory Virtual address space can be larger than physical address space.

Address Calculation Virtual memory physical memory Virtual page numberPage offsetPhysical page numberPage offset Virtual address physical address Address translation page table

Page Table Virtual page number Valid bitPhysical page number 0000… …0011 … 0011… …1111 Page offset Physical page numberPage offset Virtual address physical address Page table register

Page fault When the valid bit of the requested page = 0, a page fault occurs. Handling page fault –Get the requested page from disk (use information in the page table) –Find an available page in the memory If there is one, put the requested page in and update the entry in the page table. If there is none, find a page to be replaced (according to page replacement policy), replace it, and update both entries in the page table.

Page Replacement Page replacement policy –Least recently used (LRU): replace the page that has not been used for the longest time. Updating data in the virtual memory –If the replaced page was changed (written on the page), the page must be updated in the virtual memory. –Writing-back is more efficient than write-through. –If the replaced page was not changed (written on the page), no virtual memory update is necessary.

Other information in page tables Use/reference bit –Used for LRU policy Dirty bit –Used for updating the virtual memory

Translation-lookaside buffer (TLB) Cache that stores recently-used part of page table for efficiency When the operating system switches from process A to process B (called context switch), A’s page table must be replaced by B’s page table in TLB.

disk A swap space CB memory part of Apart of Cpart of B A’s page tableB’s page tableC’s page table TLB Currently used page table cache Currently used data & prog CPU

Three C’s

Effects of the three C’s Compulsory misses are too small to be seen in this graph. One-way set associativity two-way set associativity Four and eight-way set associativity

Design factors