6.175 Project Presentation Tony (Sen) Chang Antonio Rivera.

Slides:



Advertisements
Similar presentations
In-Order Execution In-order execution does not always give the best performance on superscalar machines. The following example uses in-order execution.
Advertisements

Detecting Bugs Using Assertions Ben Scribner. Defining the Problem  Bugs exist  Unexpected errors happen Hardware failures Loss of data Data may exist.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
11/8/2005Comp 120 Fall November 9 classes to go! Read Section 7.5 especially important!
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.
CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Instruction-Level Parallelism (ILP)
1 A few words about the quiz Closed book, but you may bring in a page of handwritten notes. –You need to know what the “core” MIPS instructions do. –I.
Detailed look at the TigerSHARC pipeline Cycle counting for the IALU versionof the DC_Removal algorithm.
Threading Part 3 CS221 – 4/24/09. Teacher Survey Fill out the survey in next week’s lab You will be asked to assess: – The Course – The Teacher – The.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
1 Arithmetic in C. 2 Type Casting: STOPPED You can change the data type of the variable in an expression by: (data_Type) Variable_Name Ex: int a = 15;
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Superscalar SMIPS Processor Andy Wright Leslie Maldonado.
28/08/2015SJF L31 F21SF Software Engineering Foundations ASSUMPTIONS AND TESTING Monica Farrow EM G30 Material available on Vision.
CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.
Objectives Understand the basic concepts and definitions relating to testing, like error, fault, failure, test case, test suite, test harness. Explore.
Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter – Part 3 Understanding the memory pipeline issues.
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
1 Causal-Consistent Reversible Debugging Ivan Lanese Focus research group Computer Science and Engineering Department University of Bologna/INRIA Bologna,
1 Lecture 5 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading: Textbook, Ch. 2.1 “Complexity-Effective.
3.6 - Implicit Differentiation (page ) b We have been differentiating functions that are expressed in the form y=f(x). b An equation in this form.
1 of 7 ECE 511, Computer Architecture Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign M. Frank Final Exam.
Constructive Computer Architecture Tutorial 8 Final Project Part 2: Coherence Sizhuo Zhang TA Nov 25, 2015T08-1http://csg.csail.mit.edu/6.175.
5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.
 Software Development Life Cycle  Software Development Tools  High Level Programming:  Structures  Algorithms  Iteration  Pseudocode  Order of.
Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.
Understanding General Software Development Lesson 3.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Remote Procedure Calls
Cache Coherence Constructive Computer Architecture Arvind
6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers.
CS2100 Computer Organization
Constructive Computer Architecture Tutorial 6: Discussion for lab6
Project Part 2: Coherence
Test 2 review Lectures 5-10.
Pipeline Implementation (4.6)
Notation Addresses are ordered triples:
Morgan Kaufmann Publishers The Processor
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Superscalar Processors & VLIW Processors
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Constructive Computer Architecture Tutorial 7 Final Project Overview
Lab 4 Overview: 6-stage SMIPS Pipeline
Lecture 5: Snooping Protocol Design Issues
Checking for issue/dispatch
Topic 1: Problem Solving
James McCabe & Juan Rodriguez
Static vs. dynamic scheduling
Control unit extension for data hazards
Lecture 22: Consistency Models, TM
Lecture 20: OOO, Memory Hierarchy
Instruction Execution Cycle
Subject : T0152 – Programming Language Concept
Geoff Gunow and Sara Sinback (geogunow, sinback)
Lucas Lancellotti Lucas Santana
Patrick Lowe and Jyotishka Biswas
Edwin Africano Adityanarayanan Radhakrishnan
Ian Reynolds, Obasi Onuoha, Phillip Cherner
by Brian Wheatman and Elaine Gan
Control unit extension for data hazards
Control unit extension for data hazards
Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture
Presentation transcript:

6.175 Project Presentation Tony (Sen) Chang Antonio Rivera

Progress Completed all exercises and discussion questions All tests pass, benchmarks are in the required range

Division of Work (by exercise) Antonio Part 1 Exercise 1 Part 2 Exercise 3 Exercise 4 Exercise 5 Exercise 8 Presentation Tony Part 1 Exercise 2 Exercise 3 Part 2 Exercise 1 Exercise 2 Exercise 6 Exercise 7 Exercise 9 Exercise 10 Discussion Questions Presentation

Division of Work (by concept) Antonio L1 D-Cache Parent Protocol Processor Atomic Instructions RefDMem Calls Tony Store Queue (both parts) Message FIFO Message Routing LHUSM (both parts) Integration into 3-Cycle and 6-Stage Processors

The War on IPC Requirement: Reality: After implementing ReqQ, IPC is about 0.4 The war on IPC: This needs to be 2 Allows continuous cache reads Puts IPC to about ~0.6

The War on IPC Scoreboard needs to be of size 6: Reduces Stalls Puts IPC of cache_conf. to about 0.8 But stq.S is about ~0.75 (Supposed to be above 0.9) Train Btb on Decode Redirect Puts IPC of stq to about 0.8 Still didn’t solve all the problems

The One Problem What’s wrong with this code?

The One Problem Fifo.first() Will block no matter what! Think of while(list[0] != 0 || list.count == 0) Correct Code: Happened Twice Check for not empty first

Results on IPC Improvement After fixing the fifo problem, stq.S has about 0.95 IPC IPC improvements overall:

Bugs: Antonio Most problems came from not recognizing implicit guards For PPP, had trouble with the scheduling of rules, deadlock Had to make waitc use EHRs (double-write problem) With Lr and Sc, originally didn’t write data to cache on Sc fail Also had to set MSI state to M This fixed the deadlock problem

(continued) To fix bugs, relied heavily on $display statements refDMem caught a shortcut I forgot to remove in PPP (assumed only 2 caches) Relied on unit tests for L1 D-Cache and PPP

Improvements to the Course Some of the notation on the slides was confusing/unclear at times Ex: the slides used the variable i to represent iterating over all other caches (L22-18) This was only explicitly written on an earlier slide I originally used the Req’s ID in place of I Better Bsv Documentation / IDE Windows support?