Download presentation
Presentation is loading. Please wait.
1
Chapter 3 Instruction-Level Parallelism and Its Dynamic Exploitation – Concepts 吳俊興 高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構
2
2 Chapter Overview 3.1Instruction Level Parallelism: Concepts and Challenges 3.2Overcoming Data Hazards with Dynamic Scheduling 3.3Dynamic Scheduling: Examples & the Algorithm 3.4Reducing Branch Costs with Dynamic Hardware Prediction 3.5High Performance Instruction Delivery 3.6Taking Advantage of More ILP with Multiple Issue 3.7Hardware-based Speculation 3.8Studies of the Limitations of ILP 3.9Limitations on ILP for Realizable Processors 3.10The P6 Microarchitecture
3
3 3.1 Instruction-Level Parallelism (ILP) ILP: potential overlap of execution among instructions –chapter 3: dynamic, largely depend on the hardware –chapter 4: static, rely more on software (compilers) ILP is the principle that there are many instructions in code that don’t depend on each other so it’s possible to execute those instructions in parallel –Building compilers to analyze the code –Building hardware to be even smarter than that code
4
4 Pipelining and Stalls Pipeline CPI = Ideal pipeline CPI + Structural stalls + Data hazard stalls + Control stalls –Ideal pipeline CPI: measure of the maximum performance attainable by the implementation –Structural hazards: HW cannot support this combination of instructions –Data hazards: Instruction depends on result of prior instruction still in the pipeline –Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps)
5
5 Ideas To Reduce Stalls Chapter 3 Chapter 4
6
6 ILP Terminology Basic Block (BB) ILP is quite small –BB: a straight-line code sequence with no branches in except to the entry and no branches out except at the exit (only one entry and one exit) –average dynamic branch frequency 15% to 25% => 4 to 7 instructions execute between a pair of branches –Plus instructions in BB likely to depend on each other To obtain substantial performance enhancements, we must exploit ILP across multiple basic blocks 1.Simplest: loop-level parallelism to exploit parallelism among iterations of a loop –Loop Level Parallelism - that parallelism that exists within a loop. Such parallelism can cross loop iterations. –Loop Unrolling - Either the compiler or the hardware is able to exploit the parallelism inherent in the loop for (i = 0; i<=1000; i=i+1) x[i] = x[i] + y[i]; 2.Vector is another way –A vector instruction operates on a sequence of data items –If not vector, then either dynamic via branch prediction or static via loop unrolling by compiler
7
7 Instr J is data dependent on Instr I Instr J tries to read operand before Instr I writes it or Instr J is data dependent on Instr K which is dependent on Instr I Caused by a “True Dependence” (compiler term) If true dependence caused a hazard in the pipeline, called a Read After Write (RAW) hazard I: add r1,r2,r3 J: sub r4,r1,r3 Data Dependence and Hazards Instruction Level Parallelism If two instructions are parallel, they can execute simultaneously in a pipeline without causing any stalls If two instructions are dependent, they must be executed in order, though they may often be partially overlapped Three types of dependences: data dependences (true data dependences), name dependences, and control dependences An instruction j is data dependent on instruction i if either of the following holds –instruction i produces a result that may be used by instruction j –instruction j is data dependent on instruction k, and instruction k is data dependent on instruction i (chain of dependences)
8
8 Dependences are a property of programs Presence of dependence indicates potential for a hazard, but actual hazard and length of any stall is a property of the pipeline Importance of the data dependencies 1) indicates the possibility of a hazard 2) determines order in which results must be calculated 3) sets an upper bound on how much parallelism can possibly be exploited Chapter 3 looks mainly at HW schemes to avoid hazard Data Dependence and Hazards Instruction Level Parallelism
9
9 Name dependence: when 2 instructions –use same register or memory location, called a name, –but no flow of data between the instructions associated with that name; – 2 versions of name dependence Instr J writes operand before Instr I reads it Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1” If anti-dependence caused a hazard in the pipeline, called a Write After Read (WAR) hazard I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Name Dependence #1: Anti-dependence Instruction Level Parallelism
10
10 Name Dependence #2: Output dependence Instr J writes operand before Instr I writes it. Called an “output dependence” by compiler writers This also results from the reuse of name “r1” If output-dependence caused a hazard in the pipeline, called a Write After Write (WAW) hazard I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Instruction Level Parallelism
11
11 ILP and Data Hazards HW/SW must preserve program order: order instructions would execute in if executed sequentially 1 at a time as determined by original source program HW/SW goal: exploit parallelism by preserving program order only where it affects the outcome of the program Instructions involved in a name dependence can execute simultaneously or be reordered, if name used in instructions is changed so instructions do not conflict –Register renaming resolves name dependence for regs –Either statically by compiler or dynamically by HW Instruction Level Parallelism
12
12 Control Dependencies Every instruction is control dependent on some set of branches, and, in general, these control dependencies must be preserved to preserve program order if p1 { S1; }; if p2 { S2; } S1 is control dependent on p1, and S2 is control dependent on p2 but not on p1. Instruction Level Parallelism
13
13 Control Dependence Ignored Control dependence need not be preserved –willing to execute instructions that should not have been executed, thereby violating the control dependences, if can do so without affecting correctness of the program Instead, 2 properties critical to program correctness are exception behavior and data flow Instruction Level Parallelism
14
14 Exception Behavior Preserving exception behavior => any changes in instruction execution order must not change how exceptions are raised in program (=> no new exceptions) Example: DADDUR2,R3,R4 BEQZR2,L1 LWR1,0(R2) L1: Problem with moving LW before BEQZ? Instruction Level Parallelism
15
15 Data Flow Data flow: actual flow of data values among instructions that produce results and those that consume them –branches make flow dynamic, determine which instruction is supplier of data Example: DADDUR1,R2,R3 BEQZR4, L DSUBUR1,R5,R6 L:… ORR7,R1,R8 R1 depends on DADDU and DSUBU instructions OR depends on DADDU or DSUBU? Must preserve data flow on execution Instruction Level Parallelism
16
16 3.2 Dynamic Scheduling Handles cases when dependences unknown at compile time –(e.g., because they may involve a memory reference) The hardware rearranges the instruction execution to reduce the stalls while maintaining data flow and exception behavior –It simplifies the compiler Allows code that compiled for one pipeline to run efficiently on a different pipeline Hardware speculation (Section 3.7), a technique with significant performance advantages, that builds on dynamic scheduling Advantages of Dynamic Scheduling
17
17 Classic 5-stages Pipeline for MIPS Clock number Instruction number123456789 Instruction iIFIDEXMEMWB Instruction i+1IFIDEXMEMWB Instruction i+2IFIDEXMEMWB Instruction i+3IFIDEXMEMWB Instruction i+4IFIDEXMEMWB
18
18 Dynamic Scheduling Why is this in Hardware at run time? –Works when can’t know real dependence at compile time –Compiler simpler –Code for one machine runs well on another Key Idea: Allow instructions behind stall to proceed. Key Idea: Instructions executing in parallel. There are multiple execution units, so use them. DIVDF0,F2,F4 ADDDF10,F0,F8 SUBDF12,F8,F14 –Enables out-of-order execution => out-of-order completion The idea: HW Schemes: Instruction Parallelism Even though ADDD stalls, the SUBD has no dependencies and can run.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.