CSCI206 - Computer Organization & Programming

Slides:



Advertisements
Similar presentations
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
Advertisements

Lecture 12 Reduce Miss Penalty and Hit Time
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
Cache Performance 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Performance of Cache Memory
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.
1 Recap: Memory Hierarchy. 2 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 30, 2002 Topic: Caches (contd.)
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.
Computer Architecture Ch5-1 Ping-Liang Lai ( 賴秉樑 ) Lecture 5 Review of Memory Hierarchy (Appendix C in textbook) Computer Architecture 計算機結構.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Lecture#15. Cache Function The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that.
Memory Architecture Chapter 5 in Hennessy & Patterson.
Lecture 15 Calculating and Improving Cache Perfomance
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CSCI206 - Computer Organization & Programming
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
CSE 351 Section 9 3/1/12.
Cache Memory and Performance
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Cache Memories CSE 238/2038/2138: Systems Programming
CSC 4250 Computer Architectures
How will execution time grow with SIZE?
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
The Hardware/Software Interface CSE351 Winter 2013
The University of Adelaide, School of Computer Science
Morgan Kaufmann Publishers
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Morgan Kaufmann Publishers
William Stallings Computer Organization and Architecture 7th Edition
ECE 445 – Computer Organization
COSC121: Computer Systems. Managing Memory
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Systems Architecture II
ECE 445 – Computer Organization
Module IV Memory Organization.
November 14 6 classes to go! Read
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Morgan Kaufmann Publishers
Miss Rate versus Block Size
Part V Memory System Design
Siddhartha Chatterjee
CS 3410, Spring 2014 Computer Science Cornell University
Cache - Optimization.
Fundamentals of Computing: Computer Architecture
Cache Memory Rabi Mahapatra
Cache Performance Improvements
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

CSCI206 - Computer Organization & Programming Multilevel Caches Cache Performance zyBook: 12.4

Conflicting Cache Requirements Want low miss rate large size (low capacity miss) large blocks (low compulsory miss) high associativity (low conflict miss) We want the hit time to be fast small size (indexing memory takes time) lower associativity (reduce tag searching)

Multilevel caches are a compromise Modern CPUs typically use a small fast L1 cache a few KB not more than 4-way associative Larger L1 has diminishing rewards as hit time increases But we can reduce the miss penalty with an L2 cache L1 caches the CPU, L2 caches the L1 Larger size than L1 (hit time can be larger than L1) Increased associativity (8-16 way) slower than L1 but still much faster than main memory

CPU L1 L2 Memory Speed Size

Figure from "Computer Architecture, Fifth Edition: A Quantitative Approach" by John Hennessy and David Patterson (Morgan Kaufmann)

Review AMAT Average Memory Access Time Caching helps reduce this!

Multilevel AMAT L1 miss penalty is the AMAT for L2 access

Matrix Multiply C A B Matrix multiply is one of the most important scientific calculations 32 x 32 matrix of double precision floats = 8 K /* For each row i of A */ for (int i = 0; i < n; ++i) { /* For each column j of B */ for (int j = 0; j < n; ++j) { /* Compute C(i,j) */ double cij = C[i+j*n]; for( int k = 0; k < n; k++ ){ cij += A[i+k*n] * B[k+j*n]; } C[i+j*n] = cij;

Naïve: C = A X B (see the video)

Blocking Any algorithm can be “blocked” by modifying it to operate on smaller blocks (subsets) of data break n by n into a set of k by k matrices where k by k fits into the L1 cache greatly reduces cache misses

blocking

Matrix Multiply Speed

Instruction Caching Fetching instructions requires a memory access These addresses are typically in the text (or code) segment Data usually comes from the data / heap / stack segments We will cache instructions but this has the possibility to cause conflicts

Unified vs Split Caches Unified cache - a single cache that holds instructions and data Split caches - a separate cache for instruction fetch and data memory Typically both caches are the same size, e.g., 16 KB instruction / 16 KB data cache

Unified vs Split Caches On modern processors Split caches - Used for the L1 cache Unified cache - Used for L2 / L3 …

Multicore In a multicore system, main memory is shared between the cores Each CPU has its own private L1 cache L2 / L3 may be shared or individual Private caches create a coherence problem

Coherence Example CPU A reads data (X), cached into private L1 CPU B reads data (X), cached into private L1 CPU A changes data (X), updated in private L1 CPU B has the old value cached

Coherence Protocol Extra work to keep private L1 caches coherent Multiple CPUs can cache the same block (for reads) When a CPU writes to memory, that block must be invalidated from all other CPUs When a CPU has a read miss, we need to check all other L1 caches for that block (it may be dirty). If found, write back to memory to ensure the latest value is used.