CS 6461: Computer Architecture Instruction Level Parallelism

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

CSCI 4717/5717 Computer Architecture
ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
CS 6461: Computer Architecture Basic Compiler Techniques for Exposing ILP Instructor: Morris Lancaster Corresponding to Hennessey and Patterson Fifth Edition.
COMP4611 Tutorial 6 Instruction Level Parallelism
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
EEL Advanced Pipelining and Instruction Level Parallelism Lotzi Bölöni.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Instruction-Level Parallelism (ILP)
COMP4211 Seminar Intro to Instruction-Level Parallelism 04S1 Week 02 Oliver Diessel.
COMP4211 (Seminar) Intro to Instruction-Level Parallelism
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Chapter 3 Instruction-Level Parallelism and Its Dynamic Exploitation – Concepts 吳俊興 高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.
EENG449b/Savvides Lec /22/05 March 22, 2005 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
EECC551 - Shaaban #1 Winter 2002 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
EENG449b/Savvides Lec /20/04 February 12, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
Chapter 2 Instruction-Level Parallelism and Its Exploitation
EECC551 - Shaaban #1 Fall 2002 lec# Floating Point/Multicycle Pipelining in MIPS Completion of MIPS EX stage floating point arithmetic operations.
DLX Instruction Format
EECC551 - Shaaban #1 Winter 2011 lec# Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level.
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
CSC 4250 Computer Architectures October 13, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
EECC551 - Shaaban #1 Fall 2001 lec# Floating Point/Multicycle Pipelining in DLX Completion of DLX EX stage floating point arithmetic operations.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
RISC architecture and instruction Level Parallelism (ILP) based on “Computer Architecture: a Quantitative Approach” by Hennessy and Patterson, Morgan Kaufmann.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
CMPE 421 Parallel Computer Architecture
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.
Pipelining Example Laundry Example: Three Stages
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
CS203 – Advanced Computer Architecture Pipelining Review.
PipeliningPipelining Computer Architecture (Fall 2006)
Computer Architecture Principles Dr. Mike Frank
Concepts and Challenges
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Morgan Kaufmann Publishers The Processor
Adapted from the slides of Prof
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Instruction Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Dynamic Hardware Prediction
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
How to improve (decrease) CPI
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
CMSC 611: Advanced Computer Architecture
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Presentation transcript:

CS 6461: Computer Architecture Instruction Level Parallelism Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.1

Instruction Level Parallelism Almost all processors since 1985 use pipelining to overlap the execution of instructions and improve performance. This potential overlap among instructions is called instruction level parallelism First introduced in the IBM Stretch (Model 7030) in about 1959 Later the CDC 6600 incorporated pipelining and the use of multiple functional units The Intel i486 was the first pipelined implementation of the IA32 architecture January 2013 Instruction Level Parallelism

Instruction Level Parallelism Instruction level parallel processing is the concurrent processing of multiple instructions Difficult to achieve within a basic code block Typical MIPS programs have a dynamic branch frequency of between 15% and 25% That is, between three and six instructions execute between a pair of branches, and data hazards usually exist within these instructions as they are likely to be dependent Given basic code block size in number of instructions, ILP must be exploited across multiple blocks January 2013 Instruction Level Parallelism

Instruction Level Parallelism The current trend is toward very deep pipelines, increasing from a depth of < 10 to > 20. With more stages, each stage can be smaller, more simple and provide less gate delay, therefore very high clock rates are possible. January 2013 Instruction Level Parallelism

Loop Level Parallelism Exploitation among Iterations of a Loop Loop adding two 1000 element arrays Code for (i=1; i<= 1000; i=i+1) x[i] = x[i] + y[i]; If we look at the generated code, within a loop there may be little opportunity for overlap of instructions, but each iteration of the loop can overlap with any other iteration January 2013 Instruction Level Parallelism

Concepts and Challenges Approaches to Exploiting ILP Two major approaches Dynamic – these approaches depend upon the hardware to locate the parallelism Static – fixed solutions generated by the compiler, and thus bound at compile time These approaches are not totally disjoint, some requiring both Limitations are imposed by data and control hazards January 2013 Instruction Level Parallelism

Features Limiting Exploitation of Parallelism Program features Instruction sequences Processor features Pipeline stages and their functions Interrelationships How do program properties limit performance? Under what circumstances? January 2013 Instruction Level Parallelism

Approaches to Exploiting ILP Dynamic Approach Hardware intensive approach Dominate desktop and server markets Pentium III, 4, Athlon MIPS R10000/12000 Sun UltraSPARC III PowerPC 603, G3, G4 Alpha 21264 January 2013 Instruction Level Parallelism

Approaches to Exploiting ILP Static Approach Compiler intensive approach Embedded market and IA-64 January 2013 Instruction Level Parallelism

Instruction Level Parallelism Terminology and Ideas Cycles Per Instruction Pipeline CPI = Ideal Pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls Ideal Pipeline CPI is the max that we can achieve in a given architecture. Stalls and/or their impacts must be minimized. During 1980s CPI =1 was a target objective for single chip microprocessors 1990’s objective: reduce CPI below 1 Scalar processors are pipelined processors that are designed to fetch and issue at most one instruction every machine cycle Superscalar processors are those that are designed to fetch and issue multiple instructions every machine cycle January 2013 Instruction Level Parallelism

Approaches to Exploiting ILP That We Will Explore Technique Reduces Forwarding and bypassing Potential data hazards and stalls Delayed branches and simple branch scheduling Control hazard stalls Basic dynamic scheduling (scoreboarding) Data hazard stalls from true dependences Dynamic scheduling with renaming Data hazard stalls and stalls from antidependences and output dependences Branch prediction Control stalls Issuing multiple instructions per cycle Ideal CPI Hardware Speculation Data hazard and control hazard stalls Dynamic memory disambiguation Data hazard stalls with memory Loop unrolling Basic computer pipeline scheduling Data hazard stalls Compiler dependence analysis, software pipelining, trace scheduling Ideal CPI, data hazard stalls Hardware support for Compiler speculation Ideal CPI, data, control stalls. January 2013 Instruction Level Parallelism

Approaches to Exploiting ILP Review of Terminology Instruction issue: The process of letting an instruction move from the instruction decode phase (ID) into the instruction execution (EX) phase Interlock (pipeline interlock, instruction interlock) is the resolution of pipeline hazards via hardware. Pipeline interlock hardware must detect all pipeline hazards and ensure that all dependencies are satisfied January 2013 Instruction Level Parallelism

Data Dependencies and Hazards How much parallelism exists in a program and how it can be exploited If two instructions are parallel, they can execute simultaneously in a pipeline without causing any stalls (assuming no structural hazards exist) There are no dependencies in parallel instructions If two instructions are not parallel and must be executed in order, they may often be partially overlapped. January 2013 Instruction Level Parallelism

Instruction Level Parallelism Pipeline Hazards Hazards make it necessary to stall the pipeline. Some instructions in the pipeline are allowed to proceed while others are delayed For this example pipeline approach, when an instruction is stalled, all instructions further back in the pipeline are also stalled No new instructions are fetched during the stall Instructions issued earlier in the pipeline must continue January 2013 Instruction Level Parallelism

Data Dependencies and Hazards Data Dependences – an instruction j is data dependent on instruction i if either of the following holds Instruction i produces a result that may be used by instruction j Instruction j is data dependent on instruction k, and instruction k is data dependent on instruction i – that is, one instruction is dependent on another if there exists a chain of dependencies of the first type between two instructions. January 2013 Instruction Level Parallelism

Data Dependencies and Hazards Data Dependences – Code Example LOOP: L.D F0,0(R1) ;F0=array element ADD.D F4,F0,F2 ;add scalar in F2 S.D F4,0(R1) ;store result DADDUI R1,R1,#-8 ;decrement pointer 8 BNE R1,R2,LOOP; The above dependencies are in floating point data for the first two arrows, and integer data in the last two instructions January 2013 Instruction Level Parallelism

Data Dependencies and Hazards Data Dependences – Arrows show where order of instructions must be preserved If two instructions are dependent, they cannot be simultaneously executed or be completely overlapped January 2013 Instruction Level Parallelism

Data Dependencies and Hazards Dependencies are properties of programs Whether a given dependence results in an actual hazard being detected and whether that hazard actually causes a stall are properties of the pipeline organization January 2013 Instruction Level Parallelism

Data Dependencies and Hazards Hazard created – Code Example DADDUI R1,R1,#-8 ;decrement pointer 8 BNE R1,R2,LOOP ; When the branch test is moved from EX to ID stage If test stayed in ID, dependence would not cause a stall (Branch delay would still be two cycles however) January 2013 Instruction Level Parallelism

Data Dependencies and Hazards Branch destination and test known at end of third cycle of execution Branch destination and test known at end of second cycle of execution January 2013 Instruction Level Parallelism

Data Dependencies and Hazards Presence of dependence indicates a potential for a hazard, but the actual hazard and the length of any stall is a property of the pipeline. Data dependence Indicates possibility of stall Determines the order in which results are calculated Sets an upper bound on how much parallelism can be possibly exploited. We will focus on overcoming these limitation January 2013 Instruction Level Parallelism

Overcoming Dependences Two Ways Maintain dependence but avoid the hazard Schedule the code dynamically Transform the code January 2013 Instruction Level Parallelism

Difficulty in Detecting Dependences A data value may flow between instructions either through registers or through memory locations Therefore, detection is not always straightforward For instructions referring to memory, the register dependences are easy to detect Suppose however we have R4 = 20 and R6 = 100 and we use 100(R4) and 20(R6) Suppose we have incremented R4 in an instruction between two references (say 20(R4) ) that look identical January 2013 Instruction Level Parallelism

Name Dependences; Two Categories Two instructions use the same register or memory location, called a name, but there is actually no flow of data between the instructions associated with that name. In cases where i precedes j. 1. An antidependence between instructions i and j occurs when instruction j writes a register or memory location that instruction i reads. The original ordering must be preserved 2. An output dependence occurs when instruction i and instruction j write the same register or memory location, the order again must be preserved January 2013 Instruction Level Parallelism

Name Dependences; Two Categories 1. An antidependence i DADD R1,R2.#-8 j DADD R2,R5,0 2. An output dependence j DADD R1,R4,#10 January 2013 Instruction Level Parallelism

Instruction Level Parallelism Name Dependences Not true data dependencies, and therefore we could execute them simultaneously or reorder them if the name (register or memory location) used in the instructions is changed so that the instructions do not conflict Register renaming is easier i DADD R1,R2,#-8 j DADD R2,R4,#10 i DADD R1,R2,#-8 j DADD R5,R4,#10 January 2013 Instruction Level Parallelism

Instruction Level Parallelism Data Hazards A hazard is created whenever there is a dependence between instructions, and they are close enough that the overlap caused by pipelining or other reordering of instructions would change the order of access to the operand involved in the dependence. We must preserve program order; the order the instructions would execute if executed in a non-pipelined system However, program order only need be maintained where it affects the outcome of the program January 2013 Instruction Level Parallelism

Data Hazards – Three Types Two instructions i and j, with i occurring before j in program order, possible hazards are: RAW (read after write) – j tries to read a source before i writes it, so j incorrectly gets the old value The most common type Program order must be preserved In a simple common static pipeline a load instruction followed by an integer ALU instruction that directly uses the load result will lead to a RAW hazard January 2013 Instruction Level Parallelism

Data Hazards – Three Types Second type: WAW (write after write) – j tries to write an operand before it is written by i, with the writes ending up in the wrong order, leaving value written by i Output dependence Present in pipelines that write in more than one pipe or allow an instruction to proceed even when a previous instruction is stalled In the classic example, WB stage is used for write back, this class of hazards avoided. If reordering of instructions is allowed this is a possible hazard Suppose an integer instruction writes to a register after a floating point instruction does January 2013 Instruction Level Parallelism

Data Hazards – Three Types Third type: WAR (write after read) – j tries to write an operand before it is read by i, so i incorrectly gets the new value. Antidependence Cannot occur in most static pipelines – note that reads are early in ID and writes late in WB January 2013 Instruction Level Parallelism

Instruction Level Parallelism Control Dependencies Determines ordering of instruction, i with respect to a branch instruction so that the instruction i is executed in the correct program order and only when it should be. Example if p1 { S1; }; if p2 { S2; } January 2013 Instruction Level Parallelism

Instruction Level Parallelism Control Dependencies Example if p1 { S1; }; if p2 { S2; } S1 is control dependent on p1 and S2 is control dependent on P2 but not on P1 January 2013 Instruction Level Parallelism

Instruction Level Parallelism Control Dependencies Two constraints imposed An instruction that is control dependent on a branch cannot be moved before the branch so that its execution is no longer controlled by the branch. For example we cannot take a statement from the then portion of an if statement and move it before the if statement. An instruction that is not control dependent on a branch cannot be moved after the branch so that the execution is controlled by the branch. For example, we cannot take a statement before the if and move it into the then portion if p1 { S1; }; if p2 { S2; } January 2013 Instruction Level Parallelism

Instruction Level Parallelism Control Dependencies Two properties of our simple pipeline preserve control dependencies Instructions execute in program order Detection of control or branch hazards ensures that an instruction that is control dependent on a branch is not executed until the branch direction is known We can introduce instructions that should not have been executed (violating control dependences) if we can do so without affecting the correctness of the program January 2013 Instruction Level Parallelism

Control Dependencies are Really… Not the issue; Really the issue is the preservation of Exception behavior Data flow January 2013 Instruction Level Parallelism

Preserving Exception Behavior Preserving exception behavior means that any changes in the ordering of instruction execution must not change how exceptions are raised in the program We may relax this rule and say that reordering of instruction execution must not cause any new exceptions DADDU R2,R3,R4 BEQZ R2, L1 LW R1,0(R2) ;Could cause illegal mem acc L1: … In the above, if we do not maintain the data dependence of R2, we may change the program. If we ignore the control dependency and move the load instruction before the branch, the load instruction may cause a memory protection exception There is no visible data dependence that prevents this interchange, only control dependence January 2013 Instruction Level Parallelism

Preserving Exception Behavior To allow reordering of these instructions (which as we said preserves data dependence) we would like to just ignore the exception. January 2013 Instruction Level Parallelism

Instruction Level Parallelism Preserving Data Flow This means preserving the actual flow of data values between instructions that produce results and those that consume them. Branches make data flow dynamic, since they allow the source of data for a given instruction to come from many points January 2013 Instruction Level Parallelism

Instruction Level Parallelism Preserving Data Flow Example DADDU R1,R2,R3 BEQZ R4,L DSUBU R1,R5,R6 L: … OR R7,R1,R8 ; depends on branch taken Cannot move DSUBU above branch By preserving the control dependence of the OR on the branch we prevent an illegal change to the data flow January 2013 Instruction Level Parallelism

Instruction Level Parallelism Preserving Data Flow Sometimes violating the control dependence cannot affect either the exception behavior or the data flow DADDU R1,R2,R3 BEQZ R1,skip DSUBU R4,R5,R6 DADDU R5,R4,R9 skip: OR R7,R1,R8 ; suppose R4 not used after here If R4 unused after this point, changing the value of R4 just before the branch would not affect data flow If R4 were dead and DSUBU could not generate an exception* we could move the DSUBU instruction before the branch This is called speculation since compiler is betting on branch outcome January 2013 Instruction Level Parallelism

Control Dependence Again Control dependence in the simple pipeline is preserved by implementing control and hazard detection that can cause control stalls Can be eliminated by a variety of hardware techniques Delayed branches can reduce stalls arising from control hazards, but requires that the compiler preserve data flow January 2013 Instruction Level Parallelism