6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers.

Slides:



Advertisements
Similar presentations
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Advertisements

© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Constructive Computer Architecture Tutorial 8 Final Project Part 2: Coherence Sizhuo Zhang TA Nov 25, 2015T08-1http://csg.csail.mit.edu/6.175.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Memory Hierarchies Sonish Shrestha October 3, 2013.
1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )
Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.
Constructive Computer Architecture Store Buffers and Non-blocking Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Cache Coherence Constructive Computer Architecture Arvind
Caches-2 Constructive Computer Architecture Arvind
Cache Coherence: Directory Protocol
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Cache Coherence: Directory Protocol
Virtual Memory Chapter 7.4.
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
CSE 351 Section 9 3/1/12.
Software Coherence Management on Non-Coherent-Cache Multicores
CSC 4250 Computer Architectures
How will execution time grow with SIZE?
The Hardware/Software Interface CSE351 Winter 2013
Project Part 2: Coherence
Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia
Cache Memory Presentation I
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
Review Graph Directed Graph Undirected Graph Sub-Graph
CSI 400/500 Operating Systems Spring 2009
Caches-2 Constructive Computer Architecture Arvind
Notation Addresses are ordered triples:
Exploiting Memory Hierarchy Chapter 7
Cache Coherence Constructive Computer Architecture Arvind
Replacement Policies Assume all accesses are: Cache Replacement Policy
Constructive Computer Architecture Tutorial 7 Final Project Overview
Cache Coherence Constructive Computer Architecture Arvind
Module IV Memory Organization.
Interconnect with Cache Coherency Manager
November 14 6 classes to go! Read
Lecture 6: Transactions
Module IV Memory Organization.
Cache Coherence Constructive Computer Architecture Arvind
CDA 5155 Caches.
Adapted from slides by Sally McKee Cornell University
CC protocol for blocking caches
Caches-2 Constructive Computer Architecture Arvind
Morgan Kaufmann Publishers
Caches and store buffers
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
/ Computer Architecture and Design
Lecture 25: Multiprocessors
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
Lecture 10: Consistency Models
CS 3410, Spring 2014 Computer Science Cornell University
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
Lecture 25: Multiprocessors
Cache coherence CEG 4131 Computer Architecture III
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture: Coherence, Synchronization
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Cache Coherence Constructive Computer Architecture Arvind
Lecture 18: Coherence and Synchronization
CSE 486/586 Distributed Systems Cache Coherence
Caches-2 Constructive Computer Architecture Arvind
10/18: Lecture Topics Using spatial locality
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Lecture 11: Consistency Models
Presentation transcript:

6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers

Notation Addresses are ordered triples: (tag, index, offset) Cache lines are addressed with ordered pairs: (tag, index) Cache slots are addressed by index Reading a cache line from memory: M[(tag, index)]

Non-Blocking Cache Given: Processor requests and memory responses Assignment: Complete the following tables (not all cells should be filled) We will focus on Loads first and Stores second In later tables we integrate Loads and stores together

Multiple Requests in Flight – Part 1 Processor Memory Slot 0 Slot 1 Slot 2 Slot 3 Req Resp V W 1: Ld (0,0,0) Ld(0,0) 1 2: Ld (0,1,0) Ld (0,1) 3: Ld (0,2,0) Ld (0,2) 4: Ld (0,3,0) Ld (0,3)

Multiple Requests in Flight – Part 2 Processor Memory Slot 0 Slot 1 Slot 2 Slot 3 Req Resp V W 1 M[(0,0)] Ld 1 – data M[(0,1)] Ld 2 – data M[(0,3)] Ld 4 – data M[(0,2)] Ld 3 – data Out of order resp

Same Cache Line, Different Offset Processor Memory Slot 0 other Req Resp V W # elem in LdQ 1: Ld (0,0,0) Ld (0,0) 1 2: Ld (0,0,1) 2 M[(0,0)] Ld 1 – data Ld 2 – data 3: Ld (0,0,2) Ld 3 – data 4: Ld (0,0,3) Ld 4 – data Ld Hit Ld Hit

Same Index, Different Tag Processor Memory Slot 0 Req Resp V W Tag ? 1: Ld (0,0,0) Ld (0,0) 1 2: Ld (1,0,0) M[(0,0)] Ld 1 – data Ld (1,0) M[(1,0)] Ld 2 – data Search LdQ for next Req

Cache can’t accept Req while handling memory Resp Stores Processor Memory Slot 0 # elements in Req Resp V W D StQ 1: St x (0,0,0) Ld (0,0) 1 2: St y (0,0,1) 2 M[(0,0)] ? St 1 – ACK x 3: St z (0,0,2) St 2 – ACK St 3 - ACK Cache can’t accept Req while handling memory Resp

Store Bypassing Processor Memory Slot 0 # elements in Req Resp V W D Data(0) StQ LdQ - 1: Ld (0,0,0) Ld (0,0) 1 2: St y (0,0,0) 3: St z (0,0,0) 2 4: Ld (0,0,0) Ld 4 – z M[(0,0)] ? Ld 1 – ? St 2 – ACK y St 3 – ACK z Hit from StQ

Req sent from head of StQ Resending Requests Processor Memory Slot 0 # elements in Req Resp V W D Tag StQ LdQ ? 1: St y (0,1,0) Ld (0,1) 1 2: St z (0,0,0) Ld (0,0) 2 M[(0,0)] 3: Ld (1,0,0) Ld (1,0) M[(0,1)] St 1 – ACK M[(1,0)] Ld 3 – data St 2 – ACK Req sent from head of StQ

Cache Coherency Given: Initial cache states for a single address and a cache request for that address Assignment: Write the rules each module needs to execute to perform the cache request You may have to keep track of what messages are still in the message network. Unfortunately there is not enough space to include it in the table.

No Contest: Cache 0 - Ld Cache 0 Cache 1 Parent rule state waitp waitc 2 3

Other Cache is Writing: Cache 0 – Ld Parent rule state waitp waitc I M S 1 4 5 6 2 3

Lots of Downgrading: Cache 0 – St M state for different tag. Need to first evict this line, and then upgrade to M for the desired tag Cache 0 Cache 1 Parent rule state waitp waitc M I 8, 1 6 4 5 2 3

Bonus: Both want to write Cache 0 Cache 1 Parent rule state waitp waitc S M 1 I 4 (from 0) 5 6 2 3 4 (from 1) 4

The Rest of the Project – Part 1 Building a non-blocking cache hierarchy and testing with simulated use cases This includes designing modules for Message FIFOs, the Message Network, the Cache Parent Processor, and the Non-blocking Caches Some of the included tests are identical to the executions shown in Part 0 It is important to know what the modules are supposed to do when debugging!

Part 1: Non-Blocking Coherent Cache Summary Request from processor: If Ld request: If in StQ – return data If in cache – return data Otherwise: Enqueue into LdQ Send downgrade response* and upgrade request if possible If St request: If cache hit and StQ empty – write to cache Enqueue into StQ *Downgrade responses are not always necessary

Part 1: Non-Blocking Coherent Cache Summary Message from Parent: If upgrade response: Update cache line Search LdQ and return responses until no more hits (multiple cycles) Write to cache for head of StQ until cache miss (multiple cycle) Send upgrade to M request for head of StQ (if possible) Send upgrade to S request for LdQ entry with index matching the response (if possible) If downgrade request: Update cache line (if necessary) Send response (if necessary)

The Rest of the Project – Part 2 Integrating the Non-Blocking Cache with an out-of-order processor core to create a multicore SMIPS processor Requires adding support for LL (load-link) and SC (store-conditional) instruction and memory fences. This will also include

The Rest of the Project – Timeline Part 1: Distributed in waves this weekend Finish before checkpoint meetings Checkpoint Meetings: Wednesday, December 3rd and Friday, December 5th during class time You will sign up for slots soon Part 2: Distributed around the time of the checkpoint meetings Due December 10th at 3 PM – Strict deadline! Presentations: December 10th from 3 PM to 6 PM in 32-D463 (Star) Includes Pizza!