/ Computer Architecture and Design

Slides:



Advertisements
Similar presentations
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Advertisements

Instruction-level Parallelism Compiler Perspectives on Code Movement dependencies are a property of code, whether or not it is a HW hazard depends on.
Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
A scheme to overcome data hazards
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
COMP25212 Advanced Pipelining Out of Order Processors.
Computer Architecture Lec 8 – Instruction Level Parallelism.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
CS136, Advanced Architecture Speculation. CS136 2 Outline Speculation Speculative Tomasulo Example Memory Aliases Exceptions VLIW Increasing instruction.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
/ Computer Architecture and Design Instructor: Dr. Michael Geiger Summer 2014 Lecture 6: Speculation.
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
DAP Spr.‘98 ©UCB 1 Lecture 6: ILP Techniques Contd. Laxmi N. Bhuyan CS 162 Spring 2003.
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)
1 Overcoming Control Hazards with Dynamic Scheduling & Speculation.
1 Chapter 2: ILP and Its Exploitation Review simple static pipeline ILP Overview Dynamic branch prediction Dynamic scheduling, out-of-order execution Hardware-based.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
CS 5513 Computer Architecture Lecture 6 – Instruction Level Parallelism continued.
CS203 – Advanced Computer Architecture ILP and Speculation.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Instruction-Level Parallelism and Its Dynamic Exploitation
IBM System 360. Common architecture for a set of machines
/ Computer Architecture and Design
COMP 740: Computer Architecture and Implementation
Out of Order Processors
Dynamic Scheduling and Speculation
Step by step for Tomasulo Scheme
Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1
CS203 – Advanced Computer Architecture
CS5100 Advanced Computer Architecture Hardware-Based Speculation
/ Computer Architecture and Design
Lecture 12 Reorder Buffers
CPSC 614 Computer Architecture Lec 5 – Instruction Level Parallelism
Advantages of Dynamic Scheduling
David Patterson Electrical Engineering and Computer Sciences
High-level view Out-of-order pipeline
Tomasulo With Reorder buffer:
11/14/2018 CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.
CMSC 611: Advanced Computer Architecture
A Dynamic Algorithm: Tomasulo’s
Out of Order Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
CS252 Graduate Computer Architecture Lecture 7 Dynamic Scheduling 2: Precise Interrupts February 9th, 2010 John Kubiatowicz Electrical Engineering.
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
CS 704 Advanced Computer Architecture
Lecture 8: Dynamic ILP Topics: out-of-order processors
Adapted from the slides of Prof
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
John Kubiatowicz Electrical Engineering and Computer Sciences
September 20, 2000 Prof. John Kubiatowicz
Larry Wittie Computer Science, StonyBrook University and ~lw
Tomasulo Organization
CPSC 614 Computer Architecture Lec 5 – Instruction Level Parallelism
Adapted from the slides of Prof
/ Computer Architecture and Design
Chapter 3: ILP and Its Exploitation
September 20, 2000 Prof. John Kubiatowicz
Lecture 7 Dynamic Scheduling
Overcoming Control Hazards with Dynamic Scheduling & Speculation
Conceptual execution on a processor which exploits ILP
Presentation transcript:

16.482 / 16.561 Computer Architecture and Design Instructor: Dr. Michael Geiger Summer 2015 Lecture 6: Speculation

Computer Architecture Lecture 6 Lecture outline Announcements/reminders HW 5 to be posted, due 6/12 Today’s lecture Speculation 6/18/2018 Computer Architecture Lecture 6

Speculation to greater ILP 3 components of HW-based speculation: Dynamic branch prediction Need BTB to get target in 1 cycle Ability to speculate past branches Dynamic scheduling In Tomasulo’s algorithm, separate instruction completion from commit Once instruction is non-speculative, it can update registers/memory Reorder buffer tracks program order Head of ROB can commit when ready ROB supplies data between complete and commit 6/18/2018 Computer Architecture Lecture 6

Computer Architecture Lecture 6 Reorder Buffer Entry Each entry in the ROB contains four fields: Instruction type a branch (has no destination result), a store (has a memory address destination), or a register operation (ALU operation or load, which has register destinations) Destination Register number (for loads and ALU operations) or memory address (for stores) where the instruction result should be written Value Value of instruction result until the instruction commits Ready Indicates that instruction has completed execution, and the value is ready 6/18/2018 Computer Architecture Lecture 6

Speculative Tomasulo’s Algorithm Instruction fetch--get instruction from memory; place in Op Queue Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”) Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”) Memory access--if needed (MEM) NOTE: Stores update memory at commit, not MEM Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available. Commit—update register with reorder result When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”) 6/18/2018 Computer Architecture Lecture 6

Tomasulo’s With Reorder buffer: Done? FP Op Queue ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1 Newest Reorder Buffer Oldest F0 LD F0,10(R2) N Resolve RAW memory conflict? (address in memory buffers) Integer unit executes in parallel Registers To Memory Dest Dest from Memory Dest Reservation Stations 1 10+R2 FP adders FP multipliers 6/18/2018 Computer Architecture Lecture 6

Reorder buffer example Given the following code: Loop: L.D F0, 0(R1) MUL.D F4, F0, F2 S.D F4, 0(R1) DADDIU R1, R1, #-8 BNE R1, R2, Loop Walk through two iterations of the loop Assume 2 cycles for add, load 1 cycle for address calculation 6 cycles for multiply Forwarding via CDB 6/18/2018 Computer Architecture Lecture 6

Reorder buffer example: key points Execution stages Fetch & issue: always in order Execution & completion: may be out of order Commit: always in order Hardware Reservation stations Occupied from IS to WB Reorder buffer Occupied from IS to C Used to Maintain program order for in-order commit Supply register values between WB and C Register result status Rename registers based on ROB entries 6/18/2018 Computer Architecture Lecture 6

Memory hazards, exceptions Reorder buffer helps limit memory hazards With additional logic for disambiguation (determine if addresses match) WAW / WAR automatically removed RAW maintained by Stalling loads if store with same address is in flight Ensuring that effective addresses are computed in order Precise exceptions logical extension of ROB If instruction causes exception, flag in ROB Handle exception when instruction commits 6/18/2018 Computer Architecture Lecture 6

Computer Architecture Lecture 6 Final notes Next time Multiple issue Multithreading Reminders HW 5 to be posted, due 6/12 6/18/2018 Computer Architecture Lecture 6