6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers.

Slides:

Advertisements

Similar presentations

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

Advertisements

© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.

Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.

Constructive Computer Architecture Tutorial 8 Final Project Part 2: Coherence Sizhuo Zhang TA Nov 25, 2015T08-1http://csg.csail.mit.edu/6.175.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Memory Hierarchies Sonish Shrestha October 3, 2013.

1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )

Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.

Constructive Computer Architecture Store Buffers and Non-blocking Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

Cache Coherence Constructive Computer Architecture Arvind

Caches-2 Constructive Computer Architecture Arvind

Cache Coherence: Directory Protocol

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Cache Coherence: Directory Protocol

Virtual Memory Chapter 7.4.

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

CSE 351 Section 9 3/1/12.

Software Coherence Management on Non-Coherent-Cache Multicores

CSC 4250 Computer Architectures

How will execution time grow with SIZE?

The Hardware/Software Interface CSE351 Winter 2013

Project Part 2: Coherence

Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia

Cache Memory Presentation I

Jason F. Cantin, Mikko H. Lipasti, and James E. Smith

Review Graph Directed Graph Undirected Graph Sub-Graph

CSI 400/500 Operating Systems Spring 2009

Caches-2 Constructive Computer Architecture Arvind

Notation Addresses are ordered triples:

Exploiting Memory Hierarchy Chapter 7

Cache Coherence Constructive Computer Architecture Arvind

Replacement Policies Assume all accesses are: Cache Replacement Policy

Constructive Computer Architecture Tutorial 7 Final Project Overview

Cache Coherence Constructive Computer Architecture Arvind

Module IV Memory Organization.

Interconnect with Cache Coherency Manager

November 14 6 classes to go! Read

Lecture 6: Transactions

Module IV Memory Organization.

Cache Coherence Constructive Computer Architecture Arvind

CDA 5155 Caches.

Adapted from slides by Sally McKee Cornell University

CC protocol for blocking caches

Caches-2 Constructive Computer Architecture Arvind

Morgan Kaufmann Publishers

Caches and store buffers

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

/ Computer Architecture and Design

Lecture 25: Multiprocessors

Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia

Lecture 10: Consistency Models

CS 3410, Spring 2014 Computer Science Cornell University

Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

Lecture 25: Multiprocessors

Cache coherence CEG 4131 Computer Architecture III

Lecture 24: Virtual Memory, Multiprocessors

Lecture 23: Virtual Memory, Multiprocessors

Lecture 24: Multiprocessors

Lecture: Coherence, Synchronization

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Lecture: Coherence Topics: wrap-up of snooping-based coherence,

Cache Coherence Constructive Computer Architecture Arvind

Lecture 18: Coherence and Synchronization

CSE 486/586 Distributed Systems Cache Coherence

Caches-2 Constructive Computer Architecture Arvind

10/18: Lecture Topics Using spatial locality

Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson

Lecture 11: Consistency Models

Presentation transcript:

6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers

Notation Addresses are ordered triples: (tag, index, offset) Cache lines are addressed with ordered pairs: (tag, index) Cache slots are addressed by index Reading a cache line from memory: M[(tag, index)]

Non-Blocking Cache Given: Processor requests and memory responses Assignment: Complete the following tables (not all cells should be filled) We will focus on Loads first and Stores second In later tables we integrate Loads and stores together

Multiple Requests in Flight – Part 1 Processor Memory Slot 0 Slot 1 Slot 2 Slot 3 Req Resp V W 1: Ld (0,0,0) Ld(0,0) 1 2: Ld (0,1,0) Ld (0,1) 3: Ld (0,2,0) Ld (0,2) 4: Ld (0,3,0) Ld (0,3)

Multiple Requests in Flight – Part 2 Processor Memory Slot 0 Slot 1 Slot 2 Slot 3 Req Resp V W 1 M[(0,0)] Ld 1 – data M[(0,1)] Ld 2 – data M[(0,3)] Ld 4 – data M[(0,2)] Ld 3 – data Out of order resp

Same Cache Line, Different Offset Processor Memory Slot 0 other Req Resp V W # elem in LdQ 1: Ld (0,0,0) Ld (0,0) 1 2: Ld (0,0,1) 2 M[(0,0)] Ld 1 – data Ld 2 – data 3: Ld (0,0,2) Ld 3 – data 4: Ld (0,0,3) Ld 4 – data Ld Hit Ld Hit

Same Index, Different Tag Processor Memory Slot 0 Req Resp V W Tag ? 1: Ld (0,0,0) Ld (0,0) 1 2: Ld (1,0,0) M[(0,0)] Ld 1 – data Ld (1,0) M[(1,0)] Ld 2 – data Search LdQ for next Req

Cache can’t accept Req while handling memory Resp Stores Processor Memory Slot 0 # elements in Req Resp V W D StQ 1: St x (0,0,0) Ld (0,0) 1 2: St y (0,0,1) 2 M[(0,0)] ? St 1 – ACK x 3: St z (0,0,2) St 2 – ACK St 3 - ACK Cache can’t accept Req while handling memory Resp

Store Bypassing Processor Memory Slot 0 # elements in Req Resp V W D Data(0) StQ LdQ - 1: Ld (0,0,0) Ld (0,0) 1 2: St y (0,0,0) 3: St z (0,0,0) 2 4: Ld (0,0,0) Ld 4 – z M[(0,0)] ? Ld 1 – ? St 2 – ACK y St 3 – ACK z Hit from StQ

Req sent from head of StQ Resending Requests Processor Memory Slot 0 # elements in Req Resp V W D Tag StQ LdQ ? 1: St y (0,1,0) Ld (0,1) 1 2: St z (0,0,0) Ld (0,0) 2 M[(0,0)] 3: Ld (1,0,0) Ld (1,0) M[(0,1)] St 1 – ACK M[(1,0)] Ld 3 – data St 2 – ACK Req sent from head of StQ

Cache Coherency Given: Initial cache states for a single address and a cache request for that address Assignment: Write the rules each module needs to execute to perform the cache request You may have to keep track of what messages are still in the message network. Unfortunately there is not enough space to include it in the table.

No Contest: Cache 0 - Ld Cache 0 Cache 1 Parent rule state waitp waitc 2 3

Other Cache is Writing: Cache 0 – Ld Parent rule state waitp waitc I M S 1 4 5 6 2 3

Lots of Downgrading: Cache 0 – St M state for different tag. Need to first evict this line, and then upgrade to M for the desired tag Cache 0 Cache 1 Parent rule state waitp waitc M I 8, 1 6 4 5 2 3

Bonus: Both want to write Cache 0 Cache 1 Parent rule state waitp waitc S M 1 I 4 (from 0) 5 6 2 3 4 (from 1) 4

The Rest of the Project – Part 1 Building a non-blocking cache hierarchy and testing with simulated use cases This includes designing modules for Message FIFOs, the Message Network, the Cache Parent Processor, and the Non-blocking Caches Some of the included tests are identical to the executions shown in Part 0 It is important to know what the modules are supposed to do when debugging!

Part 1: Non-Blocking Coherent Cache Summary Request from processor: If Ld request: If in StQ – return data If in cache – return data Otherwise: Enqueue into LdQ Send downgrade response* and upgrade request if possible If St request: If cache hit and StQ empty – write to cache Enqueue into StQ *Downgrade responses are not always necessary

Part 1: Non-Blocking Coherent Cache Summary Message from Parent: If upgrade response: Update cache line Search LdQ and return responses until no more hits (multiple cycles) Write to cache for head of StQ until cache miss (multiple cycle) Send upgrade to M request for head of StQ (if possible) Send upgrade to S request for LdQ entry with index matching the response (if possible) If downgrade request: Update cache line (if necessary) Send response (if necessary)

The Rest of the Project – Part 2 Integrating the Non-Blocking Cache with an out-of-order processor core to create a multicore SMIPS processor Requires adding support for LL (load-link) and SC (store-conditional) instruction and memory fences. This will also include

The Rest of the Project – Timeline Part 1: Distributed in waves this weekend Finish before checkpoint meetings Checkpoint Meetings: Wednesday, December 3rd and Friday, December 5th during class time You will sign up for slots soon Part 2: Distributed around the time of the checkpoint meetings Due December 10th at 3 PM – Strict deadline! Presentations: December 10th from 3 PM to 6 PM in 32-D463 (Star) Includes Pizza!