FAMU-FSU College of Engineering 1 Computer Architecture EEL 4713/5764, Fall 2006 Dr. Linda DeBrunner Module #18—Cache Memory Organization.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Nov. 2014Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design.
July 2005Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design.
Feb. 2011Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Processor - Memory Interface
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Hierarchy Design Chapter 5 Karin Strauss. Background 1980: no caches 1995: two levels of caches 2004: even three levels of caches Why? Processor-Memory.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Organization.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
Mar. 2007Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design.
2/27/2002CSE Cache II Caches, part II CPU On-chip cache Off-chip cache DRAM memory Disk memory.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
FAMU-FSU College of Engineering 1 Computer Architecture EEL 4713/5764, Fall 2006 Dr. Linda DeBrunner Module #17—Main Memory Concepts.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Chapter Twelve Memory Organization
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CSE 378 Cache Performance1 Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache /
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Slide 1 Hitting the Memory Wall Memory density and capacity have grown along with the CPU power and complexity, but memory speed has not kept pace.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Cache Memory Chapter 17 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Mar. 2006Computer Architecture, Memory System DesignSlide 1 Part V Memory System Design.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CMSC 611: Advanced Computer Architecture
CSE 351 Section 9 3/1/12.
Improving Memory Access 1/3 The Cache and Virtual Memory
Cache Memory Presentation I
Lecture 21: Memory Hierarchy
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Part V Memory System Design
Part V Memory System Design
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Lecture 21: Memory Hierarchy
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

FAMU-FSU College of Engineering 1 Computer Architecture EEL 4713/5764, Fall 2006 Dr. Linda DeBrunner Module #18—Cache Memory Organization

Mar. 2006Computer Architecture, Memory System DesignSlide 2 Part V Memory System Design

Mar. 2006Computer Architecture, Memory System DesignSlide 3 V Memory System Design Topics in This Part Chapter 17 Main Memory Concepts Chapter 18 Cache Memory Organization Chapter 19 Mass Memory Concepts Chapter 20 Virtual Memory and Paging Design problem – We want a memory unit that: Can keep up with the CPU’s processing speed Has enough capacity for programs and data Is inexpensive, reliable, and energy-efficient

Mar. 2006Computer Architecture, Memory System DesignSlide 4 18 Cache Memory Organization Processor speed is improving at a faster rate than memory’s Processor-memory speed gap has been widening Cache is to main as desk drawer is to file cabinet Topics in This Chapter 18.1 The Need for a Cache 18.2 What Makes a Cache Work? 18.3 Direct-Mapped Cache 18.4 Set-Associative Cache 18.5 Cache and Main Memory 18.6 Improving Cache Performance

Mar. 2006Computer Architecture, Memory System DesignSlide The Need for a Cache Fig Cache memories act as intermediaries between the superfast processor and the much slower main memory. One level of cache with hit rate h C eff = hC fast + (1 – h)(C slow + C fast ) = C fast + (1 – h)C slow

Mar. 2006Computer Architecture, Memory System DesignSlide 6 Performance of a Two-Level Cache System Example 18.1 A system with L1 and L2 caches has a CPI of 1.2 with no cache miss. There are 1.1 memory accesses on average per instruction. What is the effective CPI with cache misses factored in? What are the effective hit rate and miss penalty overall if L1 and L2 caches are modeled as a single cache? LevelLocal hit rateMiss penalty L1 95 % 8 cycles L2 80 % 60 cycles 8 cycles 60 cycles 95% 4% 1% Solution C eff = C fast + (1 – h 1 )[C medium + (1 – h 2 )C slow ] Because C fast is included in the CPI of 1.2, we must account for the rest CPI = (1 – 0.95)[8 + (1 – 0.8)60] =  0.05  20 = 2.3 Overall: hit rate 99% (95% + 80% of 5%), miss penalty 60 cycles

Mar. 2006Computer Architecture, Memory System DesignSlide 7 Cache Memory Design Parameters Cache size (in bytes or words). A larger cache can hold more of the program’s useful data but is more costly and likely to be slower. Block or cache-line size (unit of data transfer between cache and main). With a larger cache line, more data is brought in cache with each miss. This can improve the hit rate but also may bring low-utility data in. Placement policy. Determining where an incoming cache line is stored. More flexible policies imply higher hardware cost and may or may not have performance benefits (due to more complex data location). Replacement policy. Determining which of several existing cache blocks (into which a new cache line can be mapped) should be overwritten. Typical policies: choosing a random or the least recently used block. Write policy. Determining if updates to cache words are immediately forwarded to main (write-through) or modified blocks are copied back to main if and when they must be replaced (write-back or copy-back).

Mar. 2006Computer Architecture, Memory System DesignSlide What Makes a Cache Work? Fig Assuming no conflict in address mapping, the cache will hold a small program loop in its entirety, leading to fast execution. Temporal locality Spatial locality

Mar. 2006Computer Architecture, Memory System DesignSlide 9 Desktop, Drawer, and File Cabinet Analogy Fig Items on a desktop (register) or in a drawer (cache) are more readily accessible than those in a file cabinet (main memory). Once the “working set” is in the drawer, very few trips to the file cabinet are needed.

Mar. 2006Computer Architecture, Memory System DesignSlide 10 Temporal and Spatial Localities Addresses Time From Peter Denning’s CACM paper, July 2005 (Vol. 48, No. 7, pp ) Temporal: Accesses to the same address are typically clustered in time Spatial: When a location is accessed, nearby locations tend to be accessed also Working set

Mar. 2006Computer Architecture, Memory System DesignSlide 11 Caching Benefits Related to Amdahl’s Law Example 18.2 In the drawer & file cabinet analogy, assume a hit rate h in the drawer. Formulate the situation shown in Fig in terms of Amdahl’s law. Solution Without the drawer, a document is accessed in 30 s. So, fetching 1000 documents, say, would take s. The drawer causes a fraction h of the cases to be done 6 times as fast, with access time unchanged for the remaining 1 – h. Speedup is thus 1/(1 – h + h/6) = 6 / (6 – 5h). Improving the drawer access time can increase the speedup factor but as long as the miss rate remains at 1 – h, the speedup can never exceed 1 / (1 – h). Given h = 0.9, for instance, the speedup is 4, with the upper bound being 10 for an extremely short drawer access time. Note: Some would place everything on their desktop, thinking that this yields even greater speedup. This strategy is not recommended!

Mar. 2006Computer Architecture, Memory System DesignSlide 12 Compulsory, Capacity, and Conflict Misses Compulsory misses: With on-demand fetching, first access to any item is a miss. Some “compulsory” misses can be avoided by prefetching. Capacity misses: We have to oust some items to make room for others. This leads to misses that are not incurred with an infinitely large cache. Conflict misses: Occasionally, there is free room, or space occupied by useless data, but the mapping/placement scheme forces us to displace useful items to bring in other items. This may lead to misses in future. Given a fixed-size cache, dictated, e.g., by cost factors or availability of space on the processor chip, compulsory and capacity misses are pretty much fixed. Conflict misses, on the other hand, are influenced by the data mapping scheme which is under our control. We study two popular mapping schemes: direct and set-associative.

Mar. 2006Computer Architecture, Memory System DesignSlide Direct-Mapped Cache Fig Direct-mapped cache holding 32 words within eight 4-word lines. Each line is associated with a tag and a valid bit.

Mar. 2006Computer Architecture, Memory System DesignSlide 14 Accessing a Direct-Mapped Cache Example 18.4 Fig Components of the 32-bit address in an example direct-mapped cache with byte addressing. Show cache addressing for a byte-addressable memory with 32-bit addresses. Cache line W = 16 B. Cache size L = 4096 lines (64 KB). Solution Byte offset in line is log 2 16 = 4 b. Cache line index is log = 12 b. This leaves 32 – 12 – 4 = 16 b for the tag.

Mar. 2006Computer Architecture, Memory System DesignSlide Set-Associative Cache Fig Two-way set-associative cache holding 32 words of data within 4-word lines and 2-line sets.

Mar. 2006Computer Architecture, Memory System DesignSlide 16 Accessing a Set-Associative Cache Example 18.5 Fig Components of the 32-bit address in an example two-way set-associative cache. Show cache addressing scheme for a byte-addressable memory with 32-bit addresses. Cache line width 2 W = 16 B. Set size 2 S = 2 lines. Cache size 2 L = 4096 lines (64 KB). Solution Byte offset in line is log 2 16 = 4 b. Cache set index is (log /2) = 11 b. This leaves 32 – 11 – 4 = 17 b for the tag.

Mar. 2006Computer Architecture, Memory System DesignSlide Cache and Main Memory The writing problem: Write-through slows down the cache to allow main to catch up Write-back or copy-back is less problematic, but still hurts performance due to two main memory accesses in some cases. Solution: Provide write buffers for the cache so that it does not have to wait for main memory to catch up. Harvard architecture: separate instruction and data memories von Neumann architecture: one memory for instructions and data Split cache: separate instruction and data caches (L1) Unified cache: holds instructions and data (L1, L2, L3)

Mar. 2006Computer Architecture, Memory System DesignSlide 18 Faster Main-Cache Data Transfers Fig A 256 Mb DRAM chip organized as a 32M  8 memory module: four such chips could form a 128 MB main memory unit.

Mar. 2006Computer Architecture, Memory System DesignSlide Improving Cache Performance For a given cache size, the following design issues and tradeoffs exist: Line width (2 W ). Too small a value for W causes a lot of maim memory accesses; too large a value increases the miss penalty and may tie up cache space with low-utility items that are replaced before being used. Set size or associativity (2 S ). Direct mapping (S = 0) is simple and fast; greater associativity leads to more complexity, and thus slower access, but tends to reduce conflict misses. More on this later. Line replacement policy. Usually LRU (least recently used) algorithm or some approximation thereof; not an issue for direct-mapped caches. Somewhat surprisingly, random selection works quite well in practice. Write policy. Modern caches are very fast, so that write-through if seldom a good choice. We usually implement write-back or copy-back, using write buffers to soften the impact of main memory latency.

Mar. 2006Computer Architecture, Memory System DesignSlide 20 Effect of Associativity on Cache Performance Fig Performance improvement of caches with increased associativity.

21 Before our next class meeting…  Homework #10 due on Thursday, Nov. 16 (no electronic submissions)  Short Paper #3?