Modeling CPU’s using Different MOC’s: a Case Study Trevor C. Meyerowitz Advisor: Alberto Sangiovanni-Vincentelli 290n Final Presentation May 15 2002.

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

CSCI 4717/5717 Computer Architecture
Morgan Kaufmann Publishers The Processor
ARM Cortex A8 Pipeline EE126 Wei Wang. Cortex A8 is a processor core designed by ARM Holdings. Application: Apple A4, Samsung Exynos What’s the.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Advanced Pipelining Optimally Scheduling Code Optimally Programming Code Scheduling for Superscalars (6.9) Exceptions (5.6, 6.8)
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Goal: Describe Pipelining
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in Z3 used mechanical relays and the program was on a punched tape. It used.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
Goal: Reduce the Penalty of Control Hazards
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
Multiscalar processors
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
INSTRUCTION PIPELINE. Introduction An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.
1 Presented By Şahin DELİPINAR Simon Moore,Peter Robinson,Steve Wilcox Computer Labaratory,University Of Cambridge December 15, 1995 Rotary Pipeline Processors.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
COMP25212 Lecture 51 Pipelining Reducing Instruction Execution Time.
1 Pipelining Part I CS What is Pipelining? Like an Automobile Assembly Line for Instructions –Each step does a little job of processing the instruction.
EECE 476: Computer Architecture Slide Set #5: Implementing Pipelining Tor Aamodt Slide background: Die photo of the MIPS R2000 (first commercial MIPS microprocessor)
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Introduction to Computer Organization Pipelining.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
Instruction level parallelism And Superscalar processors By Kevin Morfin.
Real-World Pipelines Idea Divide process into independent stages
CSCI206 - Computer Organization & Programming
Constructive Computer Architecture Tutorial 6: Discussion for lab6
Lecture 6: Advanced Pipelines
CSCI206 - Computer Organization & Programming
Chapter Six.
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
Systems Architecture I
CS 286 Computer Architecture & Organization
Pipeline Control unit (highly abstracted)
Control unit extension for data hazards
Dynamic Hardware Prediction
Control unit extension for data hazards
Guest Lecturer: Justin Hsia
Presentation transcript:

Modeling CPU’s using Different MOC’s: a Case Study Trevor C. Meyerowitz Advisor: Alberto Sangiovanni-Vincentelli 290n Final Presentation May

2 Outline Introduction Introduction  Motivation  The Simple CPU to be modeled  The Domains Investigated Modeling a Non-Pipelined Processor Modeling a Non-Pipelined Processor Modeling a Pipelined Processor Modeling a Pipelined Processor Demo Demo Conclusions Conclusions

3 Motivation Processor Designs are becoming much larger and more complicated Processor Designs are becoming much larger and more complicated  Many instructions in flight at a single time  Strange Orderings, Speculation  This can be very hard to verify  We are developing a methodology to help alleviate these problems. Using Different Models of Computation can Potentially Simplify the Design Task Using Different Models of Computation can Potentially Simplify the Design Task PtolemyII Allows us to Compare a Variety of these MOC’s in a Unified Framework PtolemyII Allows us to Compare a Variety of these MOC’s in a Unified Framework

4 The Simple CPU Processor Statistics Processor Statistics  Small Instruction Set  ADD, SUB, ADDI, SUBI, and BNE  Only Integer Operations  128 registers, 128 entry instruction memory This is enough to be interesting This is enough to be interesting  Data dependencies  Control flow

5 The Domains Investigated Process Networks Process Networks  Untimed Model  Kahn-Macqueen Semantics  Infinite Queue’s  Blocking Reads  Fully Deterministic  Schedule Independent Synchronous Reactive  Untimed Model  Instantaneous Communication and Computation  Iterates Until a Fixed Point is Found  Signals must be monotonic

6 The Nonpipelined Processor Code and netlist reusued for both domains (I.e. these are domain polymorphic actors) Code and netlist reusued for both domains (I.e. these are domain polymorphic actors) Represented in PtolemyII as: Fetch, Regfile, Execute and a Delay. Represented in PtolemyII as: Fetch, Regfile, Execute and a Delay. Fetch only after previous instruction has completed Fetch only after previous instruction has completed

7 Non-Pipelined Processor Pseudocode (Fetch + Regfile) public class Fetch … { … public fire() { pc = input_pc.get(0); = readIMEM(pc); output_inst.send(0, inst); output_regs.send(rs, rt); … } … } public class Reg … { … public fire() { if (read_mode) { inst = input_get_op_codes(); = read_regs(); output_regs.send(0, inst); output_regs.send(rs_v, rt_v); } else { rd_v = input_get_write_vals(); write_values(); } read_mode = !read_mode; } … }

8 Non-Pipelined Processor Pseudocode (Execute) public class Exec … { … public fire() { if (write_mode=false) { reg_vals = input_reg_vals(); inst_type = read_inst(); results = exec_inst(inst_type, reg_vals); } else { write_values(rd, results); write_next_pc(results); } write_mode = !write_mode; } … }

9 Non-Pipelined Processor: Differences between Domains SR required that we put the register read and register write in different iterations as well as split of execution and writing its results SR required that we put the register read and register write in different iterations as well as split of execution and writing its results Process networks cannot query port status Process networks cannot query port status SR requires use of prefire and postfire conditions SR requires use of prefire and postfire conditions We shared code between the two domains, SR probably has more flexibility. We shared code between the two domains, SR probably has more flexibility.

10 Pipelined Processor Only required recoding of fetch behavior Only required recoding of fetch behavior  Fetch every “iteration”  Only stall after branches (no branch prediction) No forwarding logic is required!? No forwarding logic is required!?  This is because two register reads can’t occur without a register write happening between them  Due to PN deterministic requirement  Also true because of SR because of states  Probably could structure SR to require forwarding logic (lower level of abstraction!!)

11 Pipelined Processor – Fetch pseudo-code public class Fetch … { … public fire() { if (initial_firing || prev_inst_is_branch) { pc = input_pc.get(0); } = readIMEM(pc); output_inst.send(0, inst); output_regs.send(rs, rt); … pc = pc+1; } … } Causes you to stall until the branch is finished. Immediately fires again if there is no branch!

12 Pipelining and Forwarding (t=0) FetchReg File Exec Inst_2: R3 = R1(?) + R1(?) Inst_1: R1 = R2(4) + R3(5) Inst. id Assembly code Logical meaning Inst_1ADD R1, R2, R3 R1 = R2 + R3 Inst_2ADD R3, R1, R1 R3 = R1 + R1 Register File State: R1 = 2 R2 = 4 R3 = 5 Program:

13 Pipelining and Forwarding (t=1) FetchReg File Exec Inst_2: R3 = R1(2) + R1(2) Inst_1: R1(9) = R2(4) + R3(5) Inst. id Assembly code Logical meaning Inst_1ADD R1, R2, R3 R1 = R2 + R3 Inst_2ADD R3, R1, R1 R3 = R1 + R1 Register File State: R1 = 2 R2 = 4 R3 = 5 Program: This is an error!! It should read R1 as 9. We can solve this by adding forwarding logic, or stalling the pipeline The PN and SR models don’t have this problem because they enforce the order: read inst_1, write inst_1, read inst_2

14 Pipelined Processor with Branch Prediction Still in order, but branches are predicted instead of stalling. Still in order, but branches are predicted instead of stalling. Requires recoding of Fetch and the Register File Requires recoding of Fetch and the Register File  Fetch  Performs branch prediction  Handles mispredicts  Register File  Keeps a queue of instructions  Stall on dependencies  Only write resolved instructions to regfile This represents one refinement path This represents one refinement path  Biased towards Process Networks

15 Inst RD, RS, RT (Val) ADD ADD BNE ADD ADD ADD ADD SUB Program Code:

16 Outline Introduction Introduction Modeling a Non-Pipelined Processor Modeling a Non-Pipelined Processor Modeling a Pipelined Processor Modeling a Pipelined Processor Demo Demo Conclusions Conclusions  Other Architectural Features  Observations  Future Work

17 Other Architectural Features Out of Order Execution Out of Order Execution  Requires “breaking” of PN model Superscalar execution Superscalar execution  Multiple fetches at once.. Might be problematic to do in PN. Memory systems Memory systems  Initially simple, more complicated when refinements are added.

18 Observations Process Networks are relatively easy to use and are quite predictable. Process Networks are relatively easy to use and are quite predictable. Process Networks are great for initial abstract models. Process Networks are great for initial abstract models. Synchronous Reactive is simpler than DE to work with, but more complicated to design than PN’s. Synchronous Reactive is simpler than DE to work with, but more complicated to design than PN’s. PN doesn’t deal well with ordering refinements, but SR can handle them better. PN doesn’t deal well with ordering refinements, but SR can handle them better. We envision a methodology where you start with a PN model and then move to an SR model. We envision a methodology where you start with a PN model and then move to an SR model.

19 Future Work Look at implementing other architectural features Look at implementing other architectural features Examine relaxing PN’s requirements Examine relaxing PN’s requirements Look at domain specific actors Look at domain specific actors Examine composing different MOC’s Examine composing different MOC’s Introduce timing Introduce timing

20