15-213 Recitation 6 – 3/11/01 Outline Cache Organization Replacement Policies MESI Protocol –Cache coherency for multiprocessor systems Anusha e-mail:

Slides:



Advertisements
Similar presentations
Cache Memory Exercises. Questions I Given: –memory is little-endian and byte addressable; memory size; –number of cache blocks, size of cache block –An.
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
Recitation 7 Caching By yzhuang. Announcements Pick up your exam from ECE course hub ◦ Average is 43/60 ◦ Final Grade computation? See syllabus
Processor - Memory Interface
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
The Lord of the Cache Project 3. Caches Three common cache designs: Direct-Mapped store in exactly one cache line Fully Associative store in any cache.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
How caches take advantage of Temporal locality
1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )
Computer ArchitectureFall 2008 © November 10, 2007 Nael Abu-Ghazaleh Lecture 23 Virtual.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Lecture 41: Review Session #3 Reminders –Office hours during final week TA as usual (Tuesday & Thursday 12:50pm-2:50pm) Hassan: Wednesday 1pm to 4pm or.
Cache Organization Topics Background Simple examples.
1  The second question was how to determine whether or not the data we’re interested in is already stored in the cache.  If we want to read memory address.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
1 CMPE 421 Advanced Computer Architecture Caching with Associativity PART2.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
CacheLab 10/10/2011 By Gennady Pekhimenko. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings.
Memory/Storage Architecture Lab Computer Architecture Memory Hierarchy.
By: Aidahani Binti Ahmad
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.
Lecture 5 Cache Operation ECE 463/521 Fall 2002 Edward F. Gehringer Based on notes by Drs. Eric Rotenberg & Tom Conte of NCSU.
Lecture 40: Review Session #2 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
Additional Slides By Professor Mary Jane Irwin Pennsylvania State University Group 3.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Project 11: Influence of the Number of Processors on the Miss Rate Prepared By: Suhaimi bin Mohd Sukor M
Lecture 20 Last lecture: Today’s lecture: Types of memory
1 Lecture: Virtual Memory Topics: virtual memory, TLB/cache access (Sections 2.2)
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
Cache Operation.
Cache Organization 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
COSC2410: LAB 19 INTRODUCTION TO MEMORY/CACHE DIRECT MAPPING 1.
Recitation 6 – 3/11/02 Outline Cache Organization Accessing Cache Replacement Policy Mengzhi Wang Office Hours: Thursday.
CSCI206 - Computer Organization & Programming
תרגול מס' 5: MESI Protocol
Replacement Policy Replacement policy:
Multilevel Memories (Improving performance using alittle “cash”)
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
EECE.4810/EECE.5730 Operating Systems
Replacement Policies Assume all accesses are: Cache Replacement Policy
CSCI206 - Computer Organization & Programming
Module IV Memory Organization.
Interconnect with Cache Coherency Manager
Module IV Memory Organization.
Adapted from slides by Sally McKee Cornell University
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Performance metrics for caches
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
Lecture 22: Cache Hierarchies, Memory
Lecture 21: Memory Hierarchy
Cache - Optimization.
Cache Memory and Performance
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
10/18: Lecture Topics Using spatial locality
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Presentation transcript:

Recitation 6 – 3/11/01 Outline Cache Organization Replacement Policies MESI Protocol –Cache coherency for multiprocessor systems Anusha Office Hours: Tuesday 11:30-1:00 Wean Cluster 52xx Reminders Lab 4 due Tuesday night Exam 1 Grade Adjustments at end of recitation

Cache organization (review) B–110 B–110 valid tag set 0: B = 2 b bytes per cache block E lines per set S = 2 s sets t tag bits per line 1 valid bit per line B–110 B–110 valid tag set 1: B–110 B–110 valid tag set S-1: Cache is an array of sets. Each set contains one or more lines. Each line holds a block of data.

Addressing the cache (review) Address A is in the cache if its tag matches one of the valid lines in the set associated with the set index of A t bitss bits b bits 0m-1 Address A: B–110 B–110 v v tag set s:

Parameters of cache organization Parameters: –s = set index –b = byte offset –t = tag –m = address size –t + s + b = m B = 2 b = line size E = associativity (# lines per set) S = 2 s = number of sets Cache size = B × E × S

Determining cache parameters Suppose we are told we have a 8 KB, direct-map cache with 64 byte lines, and the word size is 32 bits. –A direct-map cache has an associativity of 1. What are the values of t, s, and b? B = 2 b = 64, so b = 6 B × E × S = C = 8192 (8 KB), and we know E = 1 S = 2 s = C / B = 128, so s = 7 t = m – s – b = 32 – 6 – 7 = t = 19s = 7b = 6

One more example Suppose our cache is 16 KB, 4-way set associative with 32 byte lines. These are the parameters to the L1 cache of the P3 Xeon processors used by the fish machines. B = 2 b = 32, so b = 5 B × E × S = C = (16 KB), and E = 4 S = 2 s = C / (E × B) = 128, so s = 7 t = m – s – b = 32 – 5 – 7 = t = 20s = 7b = 5

Example 1: Direct Mapped Cache Reference String Assume Direct mapped cache, 4 four-byte lines, 6 bit addresses (t=2,s=2,b=2): LineVTagByte 0Byte 1Byte 2Byte

Direct Mapped Cache Reference String Assume Direct mapped cache, 4 four-byte lines, Final state: LineVTagByte 0Byte 1Byte 2Byte

Example 2: Set Associative Cache Reference String Four-way set associative, 4 sets, one-byte blocks (t=4,s=2,b=0): SetVTagLine 0/2VTagLine 1/

Set Associative Cache Reference String Four-way set associative, 4 sets, one-byte block, Final state: SetVTagLine 0/2VTagLine 1/

Example 3: Fully Associative Cache Reference String Fully associative, 4 four-word blocks (t=4,s=0,b=2): SetVTagByte 0Byte 1Byte 2Byte

Fully Associative Cache Reference String Fully associative, 4 four-word blocks (t=4,s=0,b=2): SetVTagByte 0Byte 1Byte 2Byte Note: Used LRU eviction policy

Replacement Policy Replacement policy: –Determines which cache line to be evicted –Matters for set-associative caches Non-existant for direct-mapped cache

Example Assuming a 2-way associative cache, determine the number of misses for the following trace. A B C A B C B A B D A, B, C, D all mapped to the same set.

Ideal Case: OPTIMAL Policy 0: OPTIMAL –Replace the cache line that is accessed furthest in the future Properties: –Knowledge of the future –The best case scenario

Ideal Case: OPTIMAL ABCABCBABDABCABCBABD Optimal # of Misses A, + A,B+ A,C+ A,C B,C+ B,C B,A+ B,A D,A+ 6

Policy 1: FIFO –Replace the oldest cache line

Policy 1: FIFO ABCABCBABDABCABCBABD Optimal # of Misses A, + A,B+ A,C+ A,C B,C+ B,C B,A+ B,A D,A+ 6 FIFO A, + A,B+ C,B+ C,A+ B,A+ B,C+ B,C A,C+ A,B+ D,B+ 9

Policy 2: LRU Policy 2: Least-Recently Used –Replace the least-recently used cache line Properties: –Approximate the OPTIMAL policy by predicting the future behavior using the past behavior The least-recently used cache line will not be likely to be accessed again in near future

Policy 2: LRU ABCABCBABDABCABCBABD Optimal # of Misses A, + A,B+ A,C+ A,C B,C+ B,C B,A+ B,A D,A+ 6 FIFO A, + A,B+ C,B+ C,A+ B,A+ B,C+ B,C A,C+ A,B+ D,B+ 9 LRU A, + A,B+ C,B+ C,A+ B,A+ B,C+ B,C B,A+ B,A B,D+ 8

Realty: Pseudo LRU Realty –LRU is hard to implement –Pseudo LRU is implemented as an approximation of LRU Pseudo LRU –Each cache line is equipped with a bit –The bit is cleared periodically –The bit is set when the cache line is accessed –Evict the cache line that has the bit unset

Multiprocessor Systems Multiprocessor systems are common, but they are not as easy to build as “adding a processor” Might think of a multiprocessor system like this: Processor 1 Processor 2 Memory

The Problem… Caches can become unsynchronized –Big problem for any system. Memory should be viewed consistently by each processor Processor 1 Processor 2 Memory Cache 1 Cache 2

Cache Coherency Imagine that each processor’s cache could see what the other is doing –Both of them could stay up to date (“coherent”) –How they manage to do so is a “cache coherency protocol” The most widely used protocol is MESI –MESI = Modified Exclusive Shared Invalid –Each of these is a state for each cache line –Invalid – Data is invalid and must be retrieved from memory –Exclusive – This processor has exclusive access to the data –Shared – Other caches have copies of the data –Modified – This cache holds a modified copy of the data (other caches do not have the updated copy)

MESI Protocol