CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson

Slides:



Advertisements
Similar presentations
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Advertisements

1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
The Memory Hierarchy CPSC 321 Andreas Klappenecker.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Hierarchy Design Chapter 5 Karin Strauss. Background 1980: no caches 1995: two levels of caches 2004: even three levels of caches Why? Processor-Memory.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Memory Organization.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
11/3/2005Comp 120 Fall November 10 classes to go! Cache.
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Systems I Locality and Caching
ECE Dept., University of Toronto
CMPE 421 Parallel Computer Architecture
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Computer Organization & Programming
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
Introduction to computer architecture April 7th. Access to main memory –E.g. 1: individual memory accesses for j=0, j++, j
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
CMSC 611: Advanced Computer Architecture
CSE 351 Section 9 3/1/12.
Yu-Lun Kuo Computer Sciences and Information Engineering
Improving Memory Access 1/3 The Cache and Virtual Memory
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
CMSC 611: Advanced Computer Architecture
How can we find data in the cache?
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Memory & Cache.
10/18: Lecture Topics Using spatial locality
Presentation transcript:

CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson

CS 524 (Wi 2003/04) - Asim LUMS 2 Cache Diagram processor 12 3 Shared Memory Network Cache Memory 1Memory 2Memory 3 processor

CS 524 (Wi 2003/04) - Asim LUMS 3 Cache Historical Note Cache hardware first appeared on production computers in the late 1960’s. Before that processor/memory communication looked like: CPUMemory Processors designed without cache were simpler because every memory access took the same amount of time.

CS 524 (Wi 2003/04) - Asim LUMS 4 Main Memory Improvements (1) A hardware improvement named interleaving reduces main memory access time. Interleaving Defined Main memory is divided into partitions or segments named memory banks. Consecutive data elements are spread across the banks. Each bank supplies one data element per bank cycle. Multiple data elements are read in parallel, one from each bank.

CS 524 (Wi 2003/04) - Asim LUMS 5 Main Memory Improvements (2) Interleaving Problem The memory interleaving improvement assumes that memory is accessed sequentially. If we have 2-way memory interleaving, but the code accesses every other location, there is no benefit. Regardless of the above problem the bank cycle time is 4-8 times the CPU clock cycle time. The main memory can’t keep up with the fast CPU and keep it busy with data. Large main memory with a cycle time comparable to the processor is not affordable.

CS 524 (Wi 2003/04) - Asim LUMS 6 Purpose of Cache The purpose of cache is to improve the memory access time to the processor. There is an overhead associated with cache, but the benefits outweigh the costs. Main Memory Cache CPU logic

CS 524 (Wi 2003/04) - Asim LUMS 7 Today’s Computers Today almost every computer, large or small, has a cache. The CPU must be able to handle variable memory access times. Disk Main Memory Cache Registers CPU

CS 524 (Wi 2003/04) - Asim LUMS 8 Registers Purpose of Registers Registers are the sources and destinations of CPU operations. Description of Registers They hold one data element and are 32 bits or 64 bits wide. They are on-chip and built from SRAM. Speed of Registers Register access speeds are comparable to processor speeds.

CS 524 (Wi 2003/04) - Asim LUMS 9 Memory Hierarchy The different memory subsystems in the memory hierarchy have different speeds, sizes, and costs. Memory technology: Smaller memory is faster Slower memory is cheaper The memory hierarchy is built so that the fastest memory is closest to the CPU, and the slower memories are further away from the CPU.

CS 524 (Wi 2003/04) - Asim LUMS 10 Memory Hierarchy (2) CPU Registers Cache Memory Disk Tape Shorter Access Time Higher Cost Longer Access Time Lower Cost

CS 524 (Wi 2003/04) - Asim LUMS 11 Memory Hierarchy (Cont.) It’s called a hierarchy because every level is a subset of a level further away. All data in one level is found in the level below. Performance is the reason for having a memory hierarchy. Analogy for Memory Hierarchy The library books on your desk are a subset of the books in the LUMS library, which in turn is a subset of the books in the Library of Congress.

CS 524 (Wi 2003/04) - Asim LUMS 12 Principle of Locality Temporal Locality When an item is referenced, it will be referenced again soon. Spatial Locality When an item is referenced, items whose addresses are nearby will tend to be referenced soon (library analogy).

CS 524 (Wi 2003/04) - Asim LUMS 13 Cache Line or Block The overhead of the cache can be reduced by fetching a chunk or block of data elements. When a main memory access is made, a cache line (or block) of data is brought into the cache instead of a single data element. A cache line is defined in terms of a number of bytes. For example, we say that the cache line is 32 bytes, or 128 bytes. This takes advantage of spatial locality. The additional elements in the cache line will most likely be needed soon.

CS 524 (Wi 2003/04) - Asim LUMS 14 Cache Line Size How large should computer designers make the cache line? The cache miss rate falls as the size of the cache line increases. But there is a point of negative returns on cache line size. When the cache line size becomes too large, the transfer time increases.

CS 524 (Wi 2003/04) - Asim LUMS 15 Cache Hit/Miss A cache hit occurs when the data element requested by the processor IS in the cache. You want to maximize cache hits. Cache Hit Rate It’s the fraction of time that the requested data IS found in the cache. A cache miss occurs when the data element requested by the processor IS NOT in the cache. You want to minimize cache misses. Cache Miss Rate Defined as Hit Rate Miss Penalty (miss time) The time needed to retrieve the data from a lower level (downstream) of the memory hierarchy.

CS 524 (Wi 2003/04) - Asim LUMS 16 Two Levels of Cache An on-chip cache performs the fastest But the computer designer makes a trade-off between die size and cache size. Hence on-chip cache has a small size. When the on-chip cache has a cache miss, the time to access the slower main memory is very large. A cache miss is very costly. To solve this problem, computer designers have implemented a larger, slower off-chip cache. It speeds up the on-chip cache miss time.

CS 524 (Wi 2003/04) - Asim LUMS 17 Two Levels of Cache (2) The on-chip cache is named First level, or L1, or primary cache The off-chip cache is named Second level, or L2, or secondary cache L1 cache misses are handled quickly. L2 cache misses have a larger performance penalty. Caches closer to the CPU are named Upstream Caches further from the CPU are named Downstream

CS 524 (Wi 2003/04) - Asim LUMS 18 Memory Hierarchy Registers L2 Cache Main Memory Disk L1 Cache CPU

CS 524 (Wi 2003/04) - Asim LUMS 19 Split or Unified Cache Unified Cache The cache is a combined instruction-data cache. Split Cache The cache is split into 2 parts. One for the instructions, the instruction cache. Another for the data, named the data cache The 2 caches are independent of each other, and they can have independent properties. Disadvantage of a Unified Cache When the data access and instruction access conflict with each other, the cache may thrash.

CS 524 (Wi 2003/04) - Asim LUMS 20 Cache Mapping Cache Mapping Defined Cache mapping determines which cache location should be used to store a copy of a data element from main memory. There are 2 mapping strategies - direct mapped cache, and set associative cache. Direct Mapped Cache There is a one to one correspondence between main memory addresses and cache addresses. cache address = main memory address MOD (size of cache) Cache lines are mapped to unique addresses.

CS 524 (Wi 2003/04) - Asim LUMS 21 Direct Mapped Cache Diagram CACHE MEMORYMEMORY

CS 524 (Wi 2003/04) - Asim LUMS 22 Set Associative Cache N-way Set Associative Cache Can think of cache as being divided into N vertical strips (usually N is 2 or 4). A cache line is assigned to just one of the strips CACHECACHE

CS 524 (Wi 2003/04) - Asim LUMS 23 Cache Block Replacement With Direct Mapped Cache A cache line can only be mapped to one unique place in cache. The new cache line replaces the cache block at that address. With Set Associative Cache There is a choice. We’ll look at 3 strategies named Random, LRU, and FIFO. Random There is a uniform random replacement within the set of cache blocks. The advantage of random replacement is that it’s simple and inexpensive to implement.

CS 524 (Wi 2003/04) - Asim LUMS 24 Cache Block Replacement (2) LRU (Least Recently Used) The block that gets replaced is the one that hasn’t been used for the longest time. The principle of temporal locality tells us that recently used data are likely to be used again soon. An advantage of LRU is that it preserves temporal locality. A disadvantage of LRU is that it’s expensive to keep track of cache access patterns. In empirical studies there was little performance difference between LRU and Random.

CS 524 (Wi 2003/04) - Asim LUMS 25 Cache Block Replacement (3) FIFO (First In First Out) Replace the block that was used N accesses ago, regardless of the access pattern. In empirical studies Random replacement generally outperformed FIFO.

CS 524 (Wi 2003/04) - Asim LUMS 26 Cache Thrashing Thrashing Definition Cache thrashing is a problem that happens when a frequently used cache line gets displaced by another frequently used cache line. Cache thrashing can happen for both instruction and data caches. The CPU can’t find the data element it wants in the cache and must make another main memory cache line access. The same data elements are repeatedly fetched into and displaced from the cache.

CS 524 (Wi 2003/04) - Asim LUMS 27 Cache Thrashing (2) Why does thrashing happen? The computational code statements have too many variables and arrays for the needed data elements to fit in cache. Cache lines are discarded and later retrieved. The arrays are dimensioned too large to fit in cache. The arrays are accessed with indirect addressing, e.g. a(k(j)). How to Reduce Thrashing The computer designer can reduce cache thrashing by increasing the cache’s set associativity.