M. Usha Professor/CSE Sona College of Technology

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 The Memory System (Chapter 5)
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
CMPE 421 Parallel Computer Architecture
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
In1210/01-PDS 1 TU-Delft The Memory System. in1210/01-PDS 2 TU-Delft Organization Word Address Byte Address
Chapter Twelve Memory Organization
Memory and cache CPU Memory I/O. CEG 320/52010: Memory and cache2 The Memory Hierarchy Registers Primary cache Secondary cache Main memory Magnetic disk.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
Computer Architecture Lecture 26 Fasih ur Rehman.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 CSCI 2510 Computer Organization Memory System II Cache In Action.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Computer Architecture Lecture 25 Fasih ur Rehman.
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Computer Organization
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
CSE 351 Section 9 3/1/12.
The Memory System (Chapter 5)
CS161 – Design and Architecture of Computer
Memory and cache CPU Memory I/O.
Lecture 12 Virtual Memory.
Multilevel Memories (Improving performance using alittle “cash”)
CS 704 Advanced Computer Architecture
The Hardware/Software Interface CSE351 Winter 2013
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
ECE 445 – Computer Organization
Lecture 21: Memory Hierarchy
Memory and cache CPU Memory I/O.
Lecture 08: Memory Hierarchy Cache Performance
ECE 445 – Computer Organization
Performance metrics for caches
Performance metrics for caches
CMSC 611: Advanced Computer Architecture
Performance metrics for caches
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
/ Computer Architecture and Design
Lecture 22: Cache Hierarchies, Memory
CS-447– Computer Architecture Lecture 20 Cache Memories
Lecture 21: Memory Hierarchy
Performance metrics for caches
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
Memory & Cache.
Performance metrics for caches
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

M. Usha Professor/CSE Sona College of Technology CACHE MEMORIES M. Usha Professor/CSE Sona College of Technology 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture AGENDA BASIC CONCEPTS PERFORMANCE CONSIDERATIONS 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture Some Basic Concepts 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture LEARNING OBJECTIVES Define Locality of Reference: spatial and temporal Define: Hit, Miss, Write-through, Write-back Discuss mapping functions: Direct, fully associative, and set-associative Define: dirty bit, valid bit, cache coherence, k-way associative cache 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture Cache What is cache? Why we need it? Locality of reference (very important) - temporal - spatial Cache block – cache line A set of contiguous address locations of some size Page 315 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture Cache Main Processor Cache memory Figure 5.14. Use of a cache memory. Replacement algorithm Hit / miss Write-through / Write-back Load through or Early-restart 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture Direct Mapping Main memory Block 0 Block j of main memory maps onto block j modulo 128 of the cache Block 1 4: one of 16 words. (each block has 16=24 words) 7: points to a particular block in the cache (128=27) 5: 5 tag bits are compared with the tag bits associated with its location in the cache. Identify which of the 32 blocks that are resident in the cache (4096/128). Cache Block 127 tag Block 0 Block 128 tag Block 1 Block 129 tag Block 127 Block 255 Block 256 Block 257 Figure 5.15. Direct-mapped cache. Block 4095 T ag Block W ord 11-12-2007 5 AU FDP on Computer Architecture 7 4 Main memory address

AU FDP on Computer Architecture Direct Mapping T ag Block W ord 5 7 4 Main memory address 11101,1111111,1100 Tag: 11101 Block: 1111111=127, in the 127th block of the cache Word:1100=12, the 12th word of the 127th block in the cache 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture Associative Mapping Main memory Block 0 Block 1 Cache tag Block 0 tag Block 1 Block i tag Block 127 4: one of 16 words. (each block has 16=24 words) 12: 12 tag bits Identify which of the 4096 blocks that are resident in the cache 4096=212. Block 4095 T ag W ord 12 4 Main memory address 11-12-2007 AU FDP on Computer Architecture Figure 5.16. Associative-mapped cache.

AU FDP on Computer Architecture Associative Mapping T ag W ord 12 4 Main memory address 111011111111,1100 Tag: 111011111111 Word:1100=12, the 12th word of a block in the cache 11-12-2007 AU FDP on Computer Architecture

Set-Associative Mapping Main memory Block 0 Block 1 Cache tag Block 0 Set 0 4: one of 16 words. (each block has 16=24 words) 6: points to a particular set in the cache (128/2=64=26) 6: 6 tag bits is used to check if the desired block is present (4096/64=26). Block 63 tag Block 1 Block 64 tag Block 2 Set 1 Block 65 tag Block 3 Block 127 tag Block 126 Set 63 Block 128 tag Block 127 Block 129 Block 4095 Figure 5.17. Set-associative-mapped cache with two blocks per set. T ag Set W ord 11-12-2007 AU FDP on Computer Architecture 6 6 4 Main memory address

Set-Associative Mapping ag Set W ord 6 6 4 Main memory address 111011,111111,1100 Tag: 111011 Set: 111111=63, in the 63th set of the cache Word:1100=12, the 12th word of the 63th set in the cache 11-12-2007 AU FDP on Computer Architecture

Replacement Algorithms Difficult to determine which blocks to kick out Least Recently Used (LRU) block The cache controller tracks references to all blocks as computation proceeds. Increase / clear track counters when a hit/miss occurs FIFO Random – more efficirnt 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture EXAMPLE SUM = O FOR j=0 to 9 do SUM = SUM +A(0,j) END AVE= SUM/10 FOR i=9 downto 0 do A(0,i)=A(0,i)/AVE 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture MAIN MEMORY 4 X 10 ARRAY COLUMN MAJOR Word addressable 16-bit address 7A00 - 7A27 HEX 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture CACHE 8 BLOCKS Each of one word of 16 bits DM FAM SAM 11-12-2007 AU FDP on Computer Architecture

Performance Considerations 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture Learning objectives Describe Parallelism in organization: Interleaving List and explain the ways to improve Hit rate Discuss cache on chip Describe enhancement techniques: Write buffer, Prefetching, Lockup-Free cache 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture Overview Two key factors: performance and cost Price/performance ratio Performance depends on how fast machine instructions can be brought into the processor for execution and how fast they can be executed. For memory hierarchy, it is beneficial if transfers to and from the faster units can be done at a rate equal to that of the faster unit. This is not possible if both the slow and the fast units are accessed in the same manner. However, it can be achieved when parallelism is used in the organizations of the slower unit. 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture TWO APPROACHES SEPARATE MODULES k high order bits to choose module N low order bits to choose word Access to other modules by DMA can go on simultaneously INTERLEAVING Low-order k bits to select module High order m bits to select a location within module i.e. successive words are available in successive modules 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture Interleaving If the main memory is structured as a collection of physically separated modules, each with its own ABR (Address buffer register) and DBR( Data buffer register), memory access operations may proceed in more than one module at the same time. Figure 5.25. Addressing multiple-module memory systems. (a) Consecutive words in a module (b) Consecutive words in consecutive modules m bits k bits k bits m bits Address in module Module MM address Module Address in module MM address ABR DBR ABR DBR ABR DBR ABR DBR ABR DBR ABR DBR Module Module Module k Module Module Module i 2 - 1 i n - 1 11-12-2007 AU FDP on Computer Architecture

Hit Rate and Miss Penalty The success rate in accessing information at various levels of the memory hierarchy – hit rate miss rate. Ideally, the entire memory hierarchy would appear to the processor as a single memory unit that has the access time of a cache on the processor chip and the size of a magnetic disk – depends on the hit rate (>>0.9). A miss causes extra time needed to bring the desired information into the cache. MISS PENALTY 11-12-2007 AU FDP on Computer Architecture

Hit Rate and Miss Penalty (cont.) Tave: average access time experienced by the processor h: hit rate M: miss penalty, the time to access information in the main memory C: the time to access information in the cache Tave=hC+(1-h)M Example: Assume that 30 percent of the instructions in a typical program perform a read/write operation, which means that there are 130 memory accesses for every 100 instructions executed. h=0.95 for instructions, h=0.9 for data C=10 clock cycles, M=17 clock cycles Time without cache 130x10 Time with cache 100(0.95x1+0.05x17)+30(0.9x1+0.1x17) The computer with the cache performs five times better = 5.04 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture How to Improve Hit Rate? Use larger cache – increased cost Increase the block size while keeping the total cache size constant. However, if the block size is too large, some items may not be referenced before the block is replaced – miss penalty increases. Load-through approach 11-12-2007 AU FDP on Computer Architecture

Caches on the Processor Chip On chip vs. off chip Two separate caches for instructions and data, respectively Single cache for both Which one has better hit rate? -- Single cache What’s the advantage of separating caches? – parallelism, better performance Level 1 and Level 2 caches L1 cache – faster and smaller. Access more than one word simultaneously and let the processor use them one at a time. L2 cache – slower and larger. How about the average access time? Average access time: tave = h1C1 + (1-h1)h2C2 + (1-h1)(1-h2)M where h is the hit rate, C is the time to access information in cache, M is the time to access information in main memory. 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture Other Enhancements Write buffer – processor doesn’t need to wait for the memory write to be completed Prefetching – prefetch the data into the cache before they are needed Lockup-Free cache – processor is able to access the cache while a miss is being serviced. 11-12-2007 AU FDP on Computer Architecture

AU FDP on Computer Architecture Thank you 11-12-2007 AU FDP on Computer Architecture