Constructive Computer Architecture Tutorial 7 Final Project Overview

Slides:



Advertisements
Similar presentations
Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago.
Advertisements

TRIPS Primary Memory System Simha Sethumadhavan 1.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
Copyright © Bluespec Inc Bluespec SystemVerilog™ Design Example A DMA Controller with a Socket Interface.
Constructive Computer Architecture Tutorial 8 Final Project Part 2: Coherence Sizhuo Zhang TA Nov 25, 2015T08-1http://csg.csail.mit.edu/6.175.
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.
Lecture 17 Final Review Prof. Mike Schulte Computer Architecture ECE 201.
6.175 Project Presentation Tony (Sen) Chang Antonio Rivera.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
Non-blocking Caches Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 14, 2012L25-1
Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.
Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.
1 Lecture 3: Pipelining Basics Today: chapter 1 wrap-up, basic pipelining implementation (Sections C.1 - C.4) Reminders:  Sign up for the class mailing.
Constructive Computer Architecture Store Buffers and Non-blocking Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Elastic Pipelines: Concurrency Issues
Cache Coherence Constructive Computer Architecture Arvind
6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers.
Caches-2 Constructive Computer Architecture Arvind
Constructive Computer Architecture Tutorial 6 Cache & Exception
ECE 353 Lab 3 Pipeline Simulator
Bluespec-6: Modeling Processors
Lecture 16: Basic Pipelining
The University of Adelaide, School of Computer Science
Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards
Assembly Language for Intel-Based Computers, 5th Edition
Constructive Computer Architecture Tutorial 6: Discussion for lab6
12.4 Memory Organization in Multiprocessor Systems
Project Part 2: Coherence
Lecture 6 Memory Hierarchy
Pipeline Implementation (4.6)
Caches-2 Constructive Computer Architecture Arvind
Notation Addresses are ordered triples:
Morgan Kaufmann Publishers The Processor
Cache Coherence Constructive Computer Architecture Arvind
Constructive Computer Architecture: Lab Comments & RISCV
Lecture 16: Basic Pipelining
Superscalar Processors & VLIW Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Cache Coherence Constructive Computer Architecture Arvind
Lab 4 Overview: 6-stage SMIPS Pipeline
Lecture 5: Pipelining Basics
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Cache Coherence Constructive Computer Architecture Arvind
6.375 Tutorial 5 RISC-V 6-Stage with Caches Ming Liu
James McCabe & Juan Rodriguez
Caches-2 Constructive Computer Architecture Arvind
Modular Refinement - 2 Arvind
Caches and store buffers
Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards.
Control Hazards Constructive Computer Architecture: Arvind
Geoff Gunow and Sara Sinback (geogunow, sinback)
CS 3410, Spring 2014 Computer Science Cornell University
Lucas Lancellotti Lucas Santana
Patrick Lowe and Jyotishka Biswas
The University of Adelaide, School of Computer Science
Bluespec SystemVerilog™
Morgan Kaufmann Publishers The Processor
Lecture 17 Multiprocessors and Thread-Level Parallelism
Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture
Cache Coherence Constructive Computer Architecture Arvind
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Caches-2 Constructive Computer Architecture Arvind
Presentation transcript:

Constructive Computer Architecture Tutorial 7 Final Project Overview Sizhuo Zhang 6.175 TA Nov 20, 2015 http://csg.csail.mit.edu/6.175

Part 1: Store Queue Based on work of Lab 7 Resolve rule conflicts in Lab 7 Add store queue to blocking cache Allow load hit under store miss Nov 20, 2015 http://csg.csail.mit.edu/6.175

Parent Protocol Processor Part 2: Cache Coherence Final target system Message Router Parent Protocol Processor L1 D Cache Core 0 Core 1 Main Memory L1 I FIFOs Nov 20, 2015 http://csg.csail.mit.edu/6.175

Cache Coherence: Loads and Stores Implement each unit Message FIFO Message router L1 D cache Parent protocol processor (PPP) Test against simple unit testbench Nov 20, 2015 http://csg.csail.mit.edu/6.175

Cache Coherence: Loads and Stores Tandem Verification Reference model for cache-coherent memory hierarchy: monolithic memory Debug interface passed to each core and D$ issue: called when req is sent to D$ Commit: when req finishes processing, i.e. read/write D$ data array; check correctness of resp and cache line value interface RefDMem; method Action issue(MemReq req); method Action commit(MemReq req, Maybe#(CacheLine) line, Maybe#(MemResp) resp); endinterface Nov 20, 2015 http://csg.csail.mit.edu/6.175

Cache Coherence: Loads and Stores Deficiency of tandem verification Cannot check deadlock Testing whole memory hierarchy Feed random req to D$ Detect bugs with tandem verification Add cycle counter to detect deadlock Nov 20, 2015 http://csg.csail.mit.edu/6.175

Cache Coherence: Loads and Stores Integrating to the processor Three-cycle core: provided Six-stage pipeline: from Lab 7 Running multi-core programs Similar to fork int main() { int coreid = getCoreId(); if(coreid == 0) { return core0(); } else { return core1(); } } Nov 20, 2015 http://csg.csail.mit.edu/6.175

Cache Coherence: Atomic Memory Instructions Add load-reserve and store-conditional to data cache Only D$ and processor pipeline need to be changed Run more programs Nov 20, 2015 http://csg.csail.mit.edu/6.175

Cache Coherence: Store Queue Add store queue to cache Similar to part 1 Atomic instruction has special behavior Programming model is not SC Introduce fence to flush store queue Allow load hit under store miss Nov 20, 2015 http://csg.csail.mit.edu/6.175

Schedule Part 1: already assigned Part 2: assigned by next Monday Deadline December 9th Presentation in class (Dec 9th) What problems/bugs you encounter How you resolve them Nov 20, 2015 http://csg.csail.mit.edu/6.175

Project Part 1: Store Queue Nov 20, 2015 http://csg.csail.mit.edu/6.175

Problem with Lab 7 Memory stage rule conflicts with rules in D$ Memory stage is more urgent Hurt performance: store miss processing is stalled // processor memory stage rule doMemory; dCache.req(...); endrule // D$ rule startMiss(status==StartMiss); endrule rule sendFillReq(status==SendFillReq); rule waitFillResp(status==WaitFillResp); method Action req(MemReq r)if(status==Ready); endmethod Nov 20, 2015 http://csg.csail.mit.edu/6.175

Resolving Conflicts Add 1-element bypass FIFO reqQ Req from processor first gets into reqQ Separate rule to deq reqQ and process No performance loss – bypass FIFO Nov 20, 2015 http://csg.csail.mit.edu/6.175

Adding Store Queue Process new req: store queue or reqQ reqQ.first == St: enq to store queue Also process store from store queue If not processing store queue, may deadlock Store queue full reqQ.first == Ld: process Ld Store in store queue will not issue Structure hazard No need for the “lockL1” EHR in lecture Nov 20, 2015 http://csg.csail.mit.edu/6.175

Load Hit Under Store Miss Performance improvement of store queue is limited Make the cache a little bit non-blocking Store miss: waiting for mem resp Not accessing cache Process a load at reqQ.first If hit: resp to processor If miss: keep it in reqQ Load hit is disallowed under load miss Load resp cannot go out-of-order Nov 20, 2015 http://csg.csail.mit.edu/6.175