CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.

Slides:



Advertisements
Similar presentations
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Advertisements

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Computer Organization and Architecture
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
Computer Architecture Lec 8 – Instruction Level Parallelism.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
CIS 629 Fall 2002 Multiple Issue/Speculation Multiple Instruction Issue: CPI < 1 To improve a pipeline’s CPI to be better [less] than one, and to utilize.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
ENGS 116 Lecture 91 Dynamic Branch Prediction and Speculation Vincent H. Berk October 10, 2005 Reading for today: Chapter 3.2 – 3.6 Reading for Wednesday:
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
CSC 4250 Computer Architectures October 13, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
Instruction-Level Parallelism dynamic scheduling prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University May 2015Instruction-Level Parallelism.
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
CS203 – Advanced Computer Architecture ILP and Speculation.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Use of Pipelining to Achieve CPI < 1
CS 352H: Computer Systems Architecture
Dynamic Scheduling Why go out of style?
/ Computer Architecture and Design
Out of Order Processors
Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1
CS203 – Advanced Computer Architecture
Pipeline Implementation (4.6)
/ Computer Architecture and Design
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
CMSC 611: Advanced Computer Architecture
Lecture 6: Advanced Pipelines
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
The University of Adelaide, School of Computer Science
CS 704 Advanced Computer Architecture
Lecture 8: Dynamic ILP Topics: out-of-order processors
Adapted from the slides of Prof
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Adapted from the slides of Prof
CSC3050 – Computer Architecture
Lecture 9: Dynamic ILP Topics: out-of-order processors
Conceptual execution on a processor which exploits ILP
Presentation transcript:

CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Sections 3.6 and Section 3.8

September 2012Lecture 5 Spring Hardware Based Speculation For highly parallel machines, maintaining control dependencies is difficult A processor executing multiple instructions per clock may need to execute a branch instruction every clock. The processor will speculate (guess) on the outcome of branches and execute the program as if the guesses are correct. –Fetch, issue and execute instructions as if they are correct; dynamic scheduling only fetches and issues the instructions –Must handle incorrect speculations

September 2012Lecture 5 Spring Hardware Based Speculation Combination of 3 key activities –Dynamic branch prediction to choose which instructions to execute –Speculation to allow the execution of instructions before the control dependencies are resolved Must be able to undo effects of an incorrectly speculated sequence –Dynamic scheduling to deal with scheduling of different combinations of basic blocks Hardware based speculation follows the predicted flow of data values to choose when to execute instructions

September 2012Lecture 5 Spring Hardware Based Speculation Approach A particular approach is an extension of Tomasulo’s algorithm. –Extend the hardware to support speculation –Separate the bypassing of results among instructions from the actual completion of an instruction, allowing an instruction to execute and to bypass its results to other instructions without allowing the instruction to perform any updates that cannot be undone –When an instruction is no longer speculative, it is committed. (An extra step in the instruction execution sequence allows it to update the register file or memory – “instruction commit”.

September 2012Lecture 5 Spring Hardware Based Speculation Approach Allow instructions to execute out of order but force them to commit in order to prevent irrevocable actions such as updating state or taking an exception Add a commit phase (to our 5 stage pipeline as an example) –Requires changes to the sequence and additional set of hardware buffers that hold the results of instructions that have finished but not committed. –The hardware buffer is called a ReOrder Buffer (ROB)

September 2012Lecture 5 Spring Reorder Buffer (ROB) Provides additional registers in the same way as the reservation stations Holds result of instruction between the time the operation associated with the instruction completes and the time the instruction commits. –It therefore is a source of operands for instructions –With speculation, however, the register file is not updated until the instruction commits

September 2012Lecture 5 Spring Reorder Buffer (ROB) Entries containing four fields –Instruction Type Indicates whether the instruction is a branch, a store or a register operation –Destination Supplies the register number or the memory address where the results should be written –Value Holds the value of the instruction result until the instruction commits –Ready Indicates that the instruction has completed execution and the value is ready

September 2012Lecture 5 Spring Reorder Buffer (ROB) In use with Tomasulo –ROBs completely replace the store buffers –Stores occur in two steps with the second step done by instruction commit –Results now tagged with ROB number rather than reservation station number (in this implementation)

September 2012Lecture 5 Spring Reorder Buffer and Reservation Stations

September 2012Lecture 5 Spring Instruction Execution With ROBs Issue –Get an instruction from instruction queue. Issue if there is an empty reservation station and an empty slot in the ROB; send the operands to the reservation station if they are available in either the register or the ROB. –Update the control entries to indicate the buffers are in use; the number of the ROB allocated for the result being sent to the reservation station so that the number can be used to tag the result when it is placed on the CDB. –If either all reservation stations or ROB is full, then instruction issue is stalled until both have available entries. –This stage is also called dispatch in a dynamically scheduled processor

September 2012Lecture 5 Spring Instruction Execution With ROBs Execute –If one or more operands is not yet available, monitor the CDB while waiting for the register to be computed. (Checks for RAW hazards) –When both operands are available at a reservation station, execute the operation –Stores need only have the base register available at this step since execution for a store is the effective address calculation

September 2012Lecture 5 Spring Instruction Execution With ROBs Write Result –When the result is available, write it on the CDB (with the ROB tag) and from the CDB to the ROB as well as to any reservation stations waiting for this result (they are watching for the tag also) –Mark the reservation station as available –If the value to be stored is available, it is written to the value field of the ROB entry for store. –If the value to be stored is not available, monitor CDB until that value is broadcast, at which time the value of the ROB entry of the store is updated

September 2012Lecture 5 Spring Instruction Execution With ROBs Commit (3 cases) –Committing instruction is a not a branch with incorrect prediction or a store (normal commit) Instruction reaches the head of the ROB and its result is present in the buffer, at this point the processor updates the register with the result and removes the instruction from the ROB –Committing instruction is a branch with incorrect prediction Speculation was wrong, ROB is flushed and execution is restarted at the correct successor of the branch –Committing instruction is a store Same as for normal Study Pages 187 through 192 to get a better understanding of speculative execution. Also, section 3.8 elaborates.

September 2012Lecture 5 Spring ROB Notes ROBs facilitate managing exception behavior –If an instruction causes an exception, we wait until it reaches the head of the reorder buffer and was on the branch taken –Looking at the examples on pages 179, 188 we see that this is more precise than Tomasulo. ROBs facilitate easy flush of instructions –When a branch instruction reaches the top of the ROB, the remaining instructions in the ROB can be cleared. No memory or register results will have been overwritten.

September 2012Lecture 5 Spring Design Considerations Register Renaming versus Reorder Buffering –Use a larger physical set of registers combined with register renaming – replacing the ROB and reservation stations How much to speculate –Special events (cache misses) –Exceptions –Additional Hardware Speculating through Multiple Branches –Probabilities of incorrect speculation add –Slightly complicates hardware

September 2012Lecture 5 Spring Multiple Issue Goal is to allow multiple instructions to issue in a clock cycle CPI cannot be reduced below 1 if we issue only one instruction every clock cycle

September 2012Lecture 5 Spring Integrated Instruction Fetch Units Implement a separate autonomous unit that feeds instructions to the rest of the pipeline. –Integrated branch prediction – branch predictor becomes part of the instruction fetch unit and is constantly predicting branches, so as to drive the fetch pipeline –Instruction prefetch – to deliver multiple instructions per clock, the instruction fetch unit will likely need to fetch ahead. The unit manages prefetching of instruction –Instruction memory access and buffering – when fetching multiple instructions per cycle a variety of complexities are encountered, including accessing multiple cache lines (structural hazards)

September 2012Lecture 5 Spring Multiple Issue Processors Superscalar Processors –Multiple Units –Static (early processors and embedded processors) and Dynamic Instruction Scheduling Very Long Instruction Word (VLIW) Processors –Issue a fixed number of instructions formatted either as one very large instruction or as a fixed instruction packed with the parallelism among instructions explicitly indicated by the instruction (EPIC) Explicitly Parallel Instruction Computers –Inherently statically scheduled by the compiler

September 2012Lecture 5 Spring Primary Approaches for Multiple Issue Processors Common Name Issue Structure Hazard DetectionScheduling Distinguishing Characteristic Examples Superscalar (Static) DynamicHardwareStaticIn-order executionSun UltraSparc II/III Superscalar (Dynamic) DynamicHardwareDynamicSome out-of-order execution IBM Power2 Superscalar (Speculative) DynamicHardwareDynamic with speculation Out-of-order execution with speculation Pentium III/4, MIPS R10K, Alpha 21264, HP PA 8500, IBM RS6411 VLIW/LIWStaticSoftwareStaticNo hazards between packets Trimedia, i860 EPICMostly static Mostly Software Mostly StaticExplicit dependences marked by compiler Itanium

September 2012Lecture 5 Spring Statically Scheduled Superscalar Processors Hardware may issue from 0 (stalled) to 8 instructions in a clock cycle Instructions issue in order and all pipeline hazards checked for at issue time among –Instructions being issued on a given clock cycle –All those in execution If some instruction is dependent or will cause structural hazard, only the instructions preceding it in sequence will be issued. This is in contrast to VLIW processors where the compiler generates a package of instructions that can be simultaneously issued and the hardware makes no dynamic decisions on multiple issue

September 2012Lecture 5 Spring Branch Target Buffers are used to keep speculative processor branches going Refer to page 205 – Figure 3.23 and the example problem. Penalties arise on first time through the code, and with a mis-prediction, This example problem has a particular set of statistics for the prediction accuracy.

September 2012Lecture 5 Spring Issues with Instruction Issue Suppose we are given a four-issue static superscalar processor. –Fetch pipeline receives from 1 to 4 instructions from the instruction fetch unit (the issue packet) which could potentially issue in one clock cycle –Examined in program order (at least conceptually) and instructions causing a structural or data hazard are not issued This examination is done in parallel, although logical order of instructions preserved (must be done in parallel) –The issue checks are complex and performing them in one cycle could constrain the minimum clock cycle length Number of gates/components in the examination logic As a result, in many statically scheduled and all dynamically scheduled superscalars, the issue stage is split and pipelined so that it can issue instructions on every clock cycle

September 2012Lecture 5 Spring Issues with Instruction Issue - Continued Now the processor must detect any hazards between the two packets of instructions while they are still in the pipeline. One approach is to –Use first stage to decide how may instructions from the packet can issue simultaneously –Use the second stage to examine the selected instructions in comparison with those already issued. In this case, branch penalties are high, now we have several pipelines to reload on incorrect branch choices Branch prediction takes on more importance

September 2012Lecture 5 Spring Issues with Instruction Issue - Continued To increase the processor’s issue rate, further pipelining of the issue stage becomes necessary, and further breakdowns are more difficult Instruction issue is one limitation on the clock rate of superscalar processors.