Caltech CS184b Winter2001 -- DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day12:

Slides:



Advertisements
Similar presentations
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Advertisements

1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Memory system.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
ECE/CS 552: Cache Performance Instructor: Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes based on notes by Mark Hill Updated by.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Chapter 7 Cache Memories.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Overview: Memory Memory Organization: General Issues (Hardware) –Objectives in Memory Design –Memory Types –Memory Hierarchies Memory Management (Software.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
CMPE 421 Parallel Computer Architecture
Chapter Twelve Memory Organization
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day14:
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day5:
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 8: April 23, 2003 Binary Translation Caching Introduction.
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day16: November 15, 2000 Retiming Structures.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day13:
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CMSC 611: Advanced Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
COSC3330 Computer Architecture
CSE 351 Section 9 3/1/12.
ESE532: System-on-a-Chip Architecture
Ramya Kandasamy CS 147 Section 3
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
How will execution time grow with SIZE?
Basic Performance Parameters in Computer Architecture:
ECE 445 – Computer Organization
CSCI206 - Computer Organization & Programming
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
CS 3410, Spring 2014 Computer Science Cornell University
Cache - Optimization.
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day12: February 15, 2000 Cache Introduction

Caltech CS184b Winter DeHon 2 Today Issue Structure Idea Cache Basics

Caltech CS184b Winter DeHon 3 Memory and Processors Memory used to compactly store –state of computation –description of computation (instructions) Memory access latency impacts performance –timing on load, store –timing on instruction fetch

Caltech CS184b Winter DeHon 4 Issues Need big memories: –hold large programs (many instructions) –hold large amounts of state Big memories are slow Memory takes up areas –want dense memories –densest memories not fast fast memories not dense Memory capacity needed not fit on die –inter-die communication is slow

Caltech CS184b Winter DeHon 5 Problem Desire to contain problem –implies large memory Large memory –implies slow memory access Programs need frequent memory access –e.g. 20% load operations –fetch required for every instruction Memory is the performance bottleneck? –Programs run slow?

Caltech CS184b Winter DeHon 6 Opportunity Architecture mantra: –exploit structure in typical problems What structure exists?

Caltech CS184b Winter DeHon 7 Memory Locality What percentage of accesses to unique addresses –addresses distinct from the last N unique addresses [Huang+Shen, Intrinsic BW, ASPLOS 7]

Caltech CS184b Winter DeHon 8 Hierarchy/Structure Summary “Memory Hierarchy” arises from area/bandwidth tradeoffs –Smaller/cheaper to store words/blocks (saves routing and control) –Smaller/cheaper to handle long retiming in larger arrays (reduce interconnect) –High bandwidth out of registers/shallow memories [from CS184a]

Caltech CS184b Winter DeHon 9 Opportunity Small memories are fast Access to memory is not random –temporal locality –short and long retiming distances Put commonly/frequently used data (instructions) in small memory

Caltech CS184b Winter DeHon 10 Memory System Idea Don’t build single, flat memory Build a hierarchy of speeds/sizes/densities –commonly accessed data in fast/small memory –infrequently used data in large/dense/cheap memory Goal –achieve speed of small memory –with density of large memory

Caltech CS184b Winter DeHon 11 Hierarchy Management Two approaches: –explicit data movement register file overlays –transparent/automatic movement invisible to model

Caltech CS184b Winter DeHon 12 Opportunity: Model Model is simple: –read data and operate upon –timing not visible Can vary timing –common case fast (in small memory) –all cases correct can answered from larger/slower memory

Caltech CS184b Winter DeHon 13 Cache Basics Small memory (cache) holds commonly used data Read goes to cache first If cache holds data –return value Else –get value from bulk (slow) memory Stall execution to hide latency –full pipeline, scoreboarding

Caltech CS184b Winter DeHon 14 Cache Questions How manage contents? –decide what goes (is kept) in cache? How know what we have in cache? How make sure consistent ? –between cache and bulk memory

Caltech CS184b Winter DeHon 15 Cache contents Ideal: cache should hold the N items that maximize the fraction of memory references which are satisfied in the cache Problem: –don’t know future –don’t know what values will be needed in the future partially limitation of model partially data dependent halting problem – (can’t say if will execute piece of code)

Caltech CS184b Winter DeHon 16 Cache Contents Look for heuristics which keep most likely set of data in cache Structure: temporal locality –high probability that recent data will be accessed again Heuristic goal: –keep the last N references in cache

Caltech CS184b Winter DeHon 17 Temporal Locality Heuristic Move data into cache on access (load, store) Remove “old” data from cache to make space

Caltech CS184b Winter DeHon 18 “Ideal” Locality Cache Stores N most recent things –store any N things –know which N things accessed –know when last used dataaddrRef cycle

Caltech CS184b Winter DeHon 19 “Ideal” Locality Cache dataaddrRef cycle = ld dataaddrRef cycle = ld dataaddrRef cycle = ld Match address If matched, update cycle Else drop oldest read from memory store in newly free slot

Caltech CS184b Winter DeHon 20 Problems with “Ideal” Locality? Need O(N) comparisons Must find oldest –(also O(N)?) Expensive dataaddrRef cycle = ld dataaddrRef cycle = ld dataaddrRef cycle = ld

Caltech CS184b Winter DeHon 21 Relaxing “Ideal” Keeping usage (and comparing) expensive Relax: –Keep only a few bits on age –Don’t bother pick victim randomly things have expected lifetime in cache old things more likely than new things if evict wrong thing, will replace very simple/cheap to implement

Caltech CS184b Winter DeHon 22 Fully Associative Memory Store both –address –data Can store any N addresses approaches ideal of “best” N things dataaddr = dataaddr = dataaddr =

Caltech CS184b Winter DeHon 23 Relaxing “Ideal” Comparison for every address is expensive Reduce comparisons –deterministically map address to a small portion of memory –Only compare addresses against that portion

Caltech CS184b Winter DeHon 24 Direct Mapped Extreme is a “direct mapped” cache Memory slot is f(addr) –usually a few low bits of address Go directly to address –check if data want is there dataaddr = dataaddrdataaddrdataaddr Addr high Addr low hit

Caltech CS184b Winter DeHon 25 Direct Mapped Cache Benefit –simple –fast Cost –multiple addresses will need same slot –conflicts mean don’t really have most recent N things –can have conflict between commonly used items

Caltech CS184b Winter DeHon 26 Set-Associative Cache Between extremes set-associative Think of M direct mapped caches One comparison for each cache Lookup in all M caches Compare and see if any have target data Can have M things which map to same address

Caltech CS184b Winter DeHon 27 Two-Way Set Associative dataaddr = dataaddr dataaddr dataaddr dataaddr = dataaddr dataaddr dataaddr Low address bits High address bits

Caltech CS184b Winter DeHon 28 Two-way Set Associative [Hennessy and Patterson 5.8]

Caltech CS184b Winter DeHon 29 Set Associative More expensive that direct mapped Can decide expense Slower than direct mapped –have to mux in correct answer Can better approximate holding N most recently/frequently used things

Caltech CS184b Winter DeHon 30 Classify Misses Compulsory –first refernce –(any cache would have) Capacity –misses due to size –(fully associative would have) Conflict –miss because of limit places to put

Caltech CS184b Winter DeHon 31 Set Associativity [Hennessy and Patterson 5.10]

Caltech CS184b Winter DeHon 32 Absolute Miss Rates [Hennessy and Patterson 5.10]

Caltech CS184b Winter DeHon 33 Policy on Writes Keep memory consistent at all times? –Or cache+memory holds values? Write through: –all writes go to memory and cache Write back: –writes go to cache –update memory only on eviction

Caltech CS184b Winter DeHon 34 Write Policy Write through –easy to implement –eviction trivial (just overwrite) –every write is slow (main memory time) Write back –fast (writes to cache) –eviction slow/complicate

Caltech CS184b Winter DeHon 35 Cache Equation... Assume hits satisfied in 1 cycle CPI = Base CPI + Refs/Instr (Miss Rate)(Miss Latency)

Caltech CS184b Winter DeHon 36 Cache Numbers CPI = Base CPI + Ref/Instr (Miss Rate)(Miss Latency) From ch2/experience –load-stores make up ~30% of operations Miss rates – …1-10% Main memory latencies –50ns Cycle times –1ns … shrinking

Caltech CS184b Winter DeHon 37 Cache Numbers No Cache –CPI=Base+0.3*50=Base+15 Cache at CPU Cycle (10% miss) –CPI=Base+0.3*0.1*50=Base +1.5 Cache at CPU Cycle (1% miss) –CPI=Base+0.3*0.01*50=Base +0.15

Caltech CS184b Winter DeHon 38 Big Ideas Structure –temporal locality Model –optimization preserving model –simple model –sophisticated implementation –details hidden

Caltech CS184b Winter DeHon 39 Big Ideas Balance competing factors –speed of cache vs. miss rate Getting best of both worlds –multi level –speed of small –capacity/density of large