CS2100 Computer Organisation Cache II (AY2015/6) Semester 1.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: Krste Asanovic & Vladimir Stojanovic
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )
Modified from notes by Saeid Nooshabadi
CS61C L22 Caches III (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #22: Caches Andy.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Direct Map Cache Tracing Exercise. Exercise #1: Setup Information CS2100 Cache I 2 Memory 4GB Memory Address 310N-1N Block Number Offset 1 Block = 8 bytes.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Memory/Storage Architecture Lab Computer Architecture Memory Hierarchy.
Lecture Objectives: 1)Define set associative cache and fully associative cache. 2)Compare and contrast the performance of set associative caches, direct.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 2 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
1 Computer Architecture Cache Memory. 2 Today is brought to you by cache What do we want? –Fast access to data from memory –Large size of memory –Acceptable.
1 CMPE 421 Parallel Computer Architecture PART4 Caching with Associativity.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
CS2100 Computer Organisation Cache II (AY2010/2011) Semester 2.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Cache Memory Chapter 17 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
CS2100 Computer Organisation Cache I (AY2015/6) Semester 1.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Lecture 20 Last lecture: Today’s lecture: Types of memory
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
CS 61C: Great Ideas in Computer Architecture Caches Part 2 Instructors: Nicholas Weaver & Vladimir Stojanovic
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CS2100 Computer Organisation Cache I (AY2010/2011) Semester 2.
CPE 626 CPU Resources: Introduction to Cache Memories Aleksandar Milenkovic Web:
CMSC 611: Advanced Computer Architecture
COSC3330 Computer Architecture
The Hardware/Software Interface CSE351 Winter 2013
Lecture: Cache Hierarchies
CS61C : Machine Structures Lecture 6. 2
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
ECE232: Hardware Organization and Design
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Lecture 11: Cache Hierarchies
Lecture 21: Memory Hierarchy
Cache - Optimization.
Cache Memory Rabi Mahapatra
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

CS2100 Computer Organisation Cache II (AY2015/6) Semester 1

Road Map: Part II Cache II 2 Performance Assembly Language Processor: Datapath Processor: Datapath Processor: Control Processor: Control Pipelining Cache Cache Part II  Type of Cache Misses  Direct-Mapped Cache  Set Associative Cache  Fully Associative Cache  Block Replacement Policy  Cache Framework  Improving Miss Penalty  Multi-Level Caches

Cache Misses: Classifications First time a memory block is accessed Cold fact of life: Not much can be done Solution: Increase cache block size Compulsory / Cold Miss Two or more distinct memory blocks map to the same cache block Big problem in direct-mapped caches Solution 1: Increase cache size Inherent restriction on cache size due to SRAM technology Solution 2: Set-Associative caches (coming next..) Conflict Miss Due to limited cache size Will be further clarified in "fully associative caches" later Capacity Miss CS Cache II

CS2100 Block Size Tradeoff (1/2) Larger block size: + Takes advantage of spatial locality -- Larger miss penalty: Takes longer time to fill up the block -- If block size is too big relative to cache size  Too few cache blocks  miss rate will go up Miss Penalty Block Size Miss Rate Exploits Spatial Locality Fewer blocks: compromises temporal locality Average Access Time Block Size Average Access Time = Hit rate x Hit Time + (1-Hit rate) x Miss penalty 4 Cache II

CS2100 Block Size Tradeoff (2/2) 5 Cache II

SET ASSOCIATIVE CACHE Another way to organize the cache blocks CS Cache II

Set-Associative (SA) Cache Analogy CS2100 Many book titles start with “T”  Too many conflicts! Hmm… how about we give more slots per letter, 2 books start with “A”, 2 books start with “B”, …. etc? 7 Cache II

Set Associative (SA) Cache N-way Set Associative Cache  A memory block can be placed in a fixed number of locations ( N > 1 ) in the cache Key Idea:  Cache consists of a number of sets: Each set contains N cache blocks  Each memory block maps to a unique cache set  Within the set, a memory block can be placed in any element of the set CS Cache II

SA Cache Structure An example of 2-way set associative cache  Each set has two cache blocks A memory block maps to an unique set  In the set, the memory block can be placed in either of the cache blocks  Need to search both to look for the memory block CS way Set Associative Cache Set IndexDataTagValidDataTagValid Set 9 Cache II

CS2100 Set-Associative Cache: Mapping Cache Block size = 2 N bytes Number of cache sets = 2 M Offset = N bits Set Index = M bits Tag = 32 – (N + M) bits Cache Block size = 2 N bytes Memory Address 310N-1N Block Number Offset N+M-1 310N-1N Tag Offset Set Index Cache Set Index = (BlockNumber) modulo (NumberOfCacheSets) Observation: It is essentially unchanged from the direct-mapping formula 10 Cache II

SA Cache Mapping: Example CS2100 Cache II 11 Memory 4GB Memory Address 310N-1N Block Number Offset 1 Block = 4 bytes N+M-1 310N-1N Tag Offset Set Index Offset, N = 2 bits Block Number = 32 – 2 = 30 bits Check: Number of Blocks = 2 30 Number of Cache Blocks = 4KB / 4bytes = 1024 = way associative, number of sets = 1024 / 4 = 256 = 2 8 Set Index, M = 8 bits Cache Tag = 32 – 8 – 2 = 22 bits Cache 4 KB

Set Associative Cache Circuitry CS2100 HitData VTag Data VTag Data VTag Data VTag Data 22 === A = BC D TagIdxOs 4 x 1 Select A B C D 8 Set Index 22 Tag 32 4-way 4-KB cache: 1-word (4-byte) blocks Note the simultaneous "search" on all elements of a set 12 Cache II

CS2100 Advantage of Associativity (1/3) Block Number (Not Address) Cache Index Example: Given this memory access sequence: Result: Cold Miss = 2 (First two accesses) Conflict Miss = 6 (The rest of the accesses) Memory Cache 13 Cache II Direct Mapped Cache

CS2100 Advantage of Associativity (2/3) Block Number (Not Address) Set Index Example: Given this memory access sequence: Result: Cold Miss = 2 (First two accesses) Conflict Miss = None (as all of them are hits!) Memory Cache 14 Cache II 2-way Set Associative Cache

CS2100 Advantage of Associativity (3/3) Rule of Thumb: A direct-mapped cache of size N has about the same miss rate as a 2-way set associative cache of size N/2 15 Cache II

CS2100 SA Cache Example: Setup Given:  Memory access sequence: 4, 0, 8, 36, 0  2-way set-associative cache with a total of four 8-byte blocks  total of 2 sets  Indicate hit/miss for each access 16 Cache II Tag Offset Set Index Offset, N = 3 bits Block Number = 32 – 3 = 29 bits 2-way associative, number of sets= 2 = 2 1 Set Index, M = 1 bits Cache Tag = 32 – 3 – 1 = 28 bits

Example: LOAD #1 Load from 4  Check: Both blocks in Set 0 are invalid [ Cold Miss ] TagOffset 0 Index Block 0Block 1 Set Index ValidTagW0W1ValidTagW0W CS Cache II 4, 0, 8, 36, 0 Result: Load from memory and place in Set 0 - Block 0 0M[0] 1 M[4] Miss

Example: LOAD #2 Block 0Block 1 Set Index ValidTagW0W1ValidTagW0W1 010M[0]M[4]0 100 Load from 0  Result: [Valid and Tag match] in Set 0-Block 0 [ Spatial Locality ] TagOffset 0 Index CS Cache II 4, 0, 8, 36, 0 Miss Hit

Example: LOAD #3 Block 0Block 1 Set Index ValidTagW0W1ValidTagW0W1 010M[0]M[4]0 100 Load from 8  Check: Both blocks in Set 1 are invalid [ Cold Miss ] TagOffset 1 Index CS Cache II 4, 0, 8, 36, 0 Miss Hit Result: Load from memory and place in Set 1 - Block 0 0M[8] 1 M[12] Miss

Example: LOAD #4 Block 0Block 1 Set Index ValidTagW0W1ValidTagW0W1 010M[0]M[4]0 110M[8]M[12]0 Load from 36  Check: [Valid but tag mismatch] Set 0 - Block 0 [Invalid] Set 0 - Block1 [ Cold Miss ] TagOffset 0 Index CS Cache II 4, 0, 8, 36, 0 Miss Hit Miss Result: Load from memory and place in Set 0 - Block 1 2M[32] 1 M[36] Miss

Example: LOAD #5 Block 0Block 1 Set Idx ValidTagW0W1ValidTagW0W1 010M[0]M[4]12M[32]M[36] 110M[8]M[12]0 Load from 0  Check: [Valid and tag match] Set 0-Block 0 [Valid but tag mismatch] Set 0-Block1 [ Temporal Locality ] TagOffset 0 Index CS Cache II 4, 0, 8, 36, 0 Miss Hit Miss Hit

FULLY ASSOCIATIVE CACHE How about total freedom? CS Cache II

Fully-Associative (FA) Analogy CS2100 Let's not restrict the book by title any more. A book can go into any location on the desk! 23 Cache II

Fully Associative (FA) Cache Fully Associative Cache  A memory block can be placed in any location in the cache Key Idea:  Memory block placement is no longer restricted by cache index / cache set index ++ Can be placed in any location, BUT --- Need to search all cache blocks for memory access CS Cache II

CS2100 Fully Associative Cache: Mapping Cache Block size = 2 N bytes Number of cache blocks = 2 M Offset = N bits Tag = 32 – N bits Cache Block size = 2 N bytes Memory Address 310N-1N Block Number Offset 310N-1 Tag Offset Observation: The block number serves as the tag in FA cache N 25 Cache II

CS2100 Fully Associative Cache Circuitry Example:  4KB cache size and 16-Byte block size  Compare tags and valid bit in parallel No Conflict Miss (since data can go anywhere) … … … … … Index ValidTagByte 0-3Byte 4-7Byte 8-11Byte bit4-bit TagOffset = = = = = = 26 Cache II

CS2100 Cache Performance Observations: 1.Cold/compulsory miss remains the same irrespective of cache size/associativity. 2.For the same cache size, conflict miss goes down with increasing associativity. 3.Conflict miss is 0 for FA caches. 4.For the same cache size, capacity miss remains the same irrespective of associativity. 5.Capacity miss decreases with increasing cache size. Total Miss = Cold miss + Conflict miss + Capacity miss Capacity miss (FA) = Total miss (FA) – Cold miss (FA), when Conflict Miss  0 Identical block size 27 Cache II

BLOCK REPLACEMENT POLICY Who should I kick out next…..? CS Cache II

CS2100 Block Replacement Policy (1/3) Set Associative or Fully Associative Cache:  Can choose where to place a memory block  Potentially replacing another cache block if full  Need block replacement policy Least Recently Used (LRU)  How: For cache hit, record the cache block that was accessed When replacing a block, choose one which has not been accessed for the longest time  Why: Temporal locality 29 Cache II

CS2100 Block Replacement Policy (2/3) Least Recently Used policy in action:  4- way Cache  Memory accesses: LRU Hit Miss (Evict Block 0) Hit Miss (Evict Block 8) Hit Access 4 Access 16 Access 12 Access 0 Access 4 x x 30 Cache II

CS2100 Block Replacement Policy (3/3) Drawback for LRU  Hard to keep track if there are many choices Other replacement policies:  First in first out (FIFO) Second chance variant  Random replacement (RR)  Least frequently used (LFU) 31 Cache II

CS2100 Direct-Mapped Cache: Four 8-byte blocks Memory accesses: 4, 8, 36, 48, 68, 0, 32 Additional Example #1  IndexValidTagWord0Word Cache II Addr:TagIndexOffset 4:00… :00… :00… :00… :00… :00… :00… M[0]M[4]01 M[8] M[12] 0 1 M[32]M[36]1

CS2100 Fully-Associative Cache: Four 8-byte blocks LRU Replacement Policy Memory accesses: 4, 8, 36, 48, 68, 0, 32 Additional Example #2  IndexValidTagWord0Word Cache II M[0]M[4]0 1 M[8]M[12]1 1 M[32]M[36]41 Addr:TagOffset 4:00… :00… :00… :00… :00… :00… :00…

CS2100 Additional Example #3  Block 0Block 1 Set Index ValidTagWord0Word1ValidTagWord0Word way Set-Associative Cache: Four 8-byte blocks LRU Replacement Policy Memory accesses: 4, 8, 36, 48, 68, 0, Cache II Addr:TagIndexOffset 4:00… :00… :00… :00… :00… :00… :00… M[0]M[4] 0 1 M[8]M[12] 0 1 M[32]M[36] 2 1

CS2100 Cache Organizations: Summary 35 Cache II One-way set associative (direct mapped) BlockTagData Two-way set associative SetTa g Dat a Ta g Data 0101 SetTa g Dat a Ta g DataTa g Dat a Ta g Data Four-way set associative Eight-way set associative (fully associative) Ta g Dat a Ta g DataTa g Dat a Ta g DataTa g Dat a Ta g DataTa g Dat a Ta g Data

CS2100 Cache Framework (1/2) Block Placement: Where can a block be placed in cache? Direct Mapped: Only one block defined by index N-way Set-Associative: Any one of the N blocks within the set defined by index Fully Associative: Any cache block Direct Mapped: Tag match with only one block N-way Set Associative: Tag match for all the blocks within the set Fully Associative: Tag match for all the blocks within the cache 36 Cache II Block Identification: How is a block found if it is in the cache?

Cache Framework (2/2) Block Replacement: Which block should be replaced on a cache miss? CS2100 Direct Mapped: No Choice N-way Set-Associative: Based on replacement policy Fully Associative: Based on replacement policy 37 Cache II Write Strategy: What happens on a write? Write Policy: Write-through vs. write-back Write Miss Policy: Write allocate vs. write no allocate

EXPLORATION What else is there? CS Cache II

CS2100 Improving Cache Penalty So far, we tried to improve Miss Rate:  Larger block size  Larger Cache  Higher Associativity What about Miss Penalty? Average access time = Hit rate x Hit Time + (1-Hit rate) x Miss penalty 39 Cache II

CS2100 Multilevel Cache: Idea Options:  Separate data and instruction caches or a unified cache Sample sizes:  L1: 32KB, 32-byte block, 4-way set associative  L2: 256KB, 128-byte block, 8-way associative  L3: 4MB, 256-byte block, Direct mapped Disk Reg L1 I$ L1 D$ Unified L2 $ Memory Processor 40 Cache II

Example: Intel Processors Itanium 2 McKinley L1: 16KB I$ + 16KB D$ L2: 256KB L3: 1.5MB – 9MB Pentium 4 Extreme Edition L1: 12KB I$ + 8KB D$ L2: 256KB L3: 2MB CS Cache II

Trend: Intel Core i7-3960K Intel Core i7-3960K per die: billion transistors -15MB shared Inst/Data Cache (LLC) per Core: -32KB L1 Inst Cache -32KB L1 Data Cache -256KB L2 Inst/Data Cache -up to 2.5MB LLC CS Cache II

CS2100 Reading Assignment Large and Fast: Exploiting Memory Hierarchy  Chapter 7 sections 7.3 (3 rd edition)  Chapter 5 sections 5.3 (4 th edition) 43 Cache II

CS2100 END 44 Cache II