Pipelining – Out-of-order execution and exceptions

Slides:

Advertisements

Similar presentations

Computer Architecture Instruction-Level Parallel Processors

Advertisements

CSCI 4717/5717 Computer Architecture

ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.

Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Advanced Pipelining Optimally Scheduling Code Optimally Programming Code Scheduling for Superscalars (6.9) Exceptions (5.6, 6.8)

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original.

Instruction-Level Parallelism (ILP)

Computer Architecture Pipelines & Superscalars. Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.

1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.

1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:

Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.

EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

What are Exception and Interrupts? MIPS terminology Exception: any unexpected change in the internal control flow – Invoking an operating system service.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.

1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.

LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.

1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.

Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Pipelining – Loop unrolling and Multiple Issue

Real-World Pipelines Idea Divide process into independent stages

CS 352H: Computer Systems Architecture

CS/COE 0447 (term 2181) Jarrett Billingsley

Computer Organization CS224

CS/COE 1541 (term 2174) Jarrett Billingsley

William Stallings Computer Organization and Architecture 8th Edition

Binary Addition and Subtraction

Pipelining Wrapup Brief overview of the rest of chapter 3

Exceptions & Multi-cycle Operations

Pipelining: Advanced ILP

The processor: Exceptions and Interrupts

Morgan Kaufmann Publishers The Processor

Pipelining review.

Superscalar Processors & VLIW Processors

The processor: Pipelining and Branching

Lecture 9. MIPS Processor Design – Pipelined Processor Design #2

Computer Architecture

How to improve (decrease) CPI

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Instruction Level Parallelism (ILP)

November 5 No exam results today. 9 Classes to go!

Instruction Execution Cycle

Project Instruction Scheduler Assembler for DLX

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

CS203 – Advanced Computer Architecture

Pipelining, Superscalar, and Out-of-order architectures

Designing a Pipelined CPU

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

ECE 445 – Computer Organization

CSC3050 – Computer Architecture

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Lecture 5: Pipeline Wrap-up, Static ILP

Instruction Level Parallelism

Presentation transcript:

Pipelining – Out-of-order execution and exceptions CS/COE 1541 (term 2174) Jarrett Billingsley

Class Announcements I suck. Higher-priority tasks, like making exams and lecture materials, have pre-empted the grading process. Hopefully with the quiz/homework answers you can have an idea of what your grade would be like, or if you have any questions. I have nothing else to do this coming weekend except grading. oh and my birthday I guess  Please stay safe. Keep a vigilant eye on those in power, especially those who ignore checks and balances. If you are a noncitizen, please research your rights. Things may be getting scary. Value your safety over your degree. 1/30/2017 CS/COE 1541 term 2174

From static to dynamic 1/30/2017 CS/COE 1541 term 2174

Very Long Instruction Word (VLIW) In an extreme case of static multiple-issue, VLIW architectures pack multiple smaller instructions into large "super-instructions." This is done by the compiler! Then the CPU blindly fetches and executes these blocks of instructions without needing to check for dependencies. This allows superscalar performance (multiple instructions per cycle) but without as much hardware overhead. Despite this, it hasn't taken off as well as its proponents hoped. Momentum is probably one reason. It always is! Static scheduling has shortcomings. SIMD instructions and extremely parallel CPUs (GPUs) have greatly increased computing throughput of traditional designs. 1/30/2017 CS/COE 1541 term 2174

VLIW would encode these pairs of instructions as super-instructions. The crux of the issue The essential problem multiple-issue architectures (of any kind) try to address is that there exists instruction-level parallelism (ILP). lw ... add ... sub ... mul ... sw ... In this code, there are two dependency chains: sequences of instructions which depend on previous instructions but not other chains. lw ... add ... sub ... sw ... mul ... VLIW would encode these pairs of instructions as super-instructions. But there's another way... 1/30/2017 CS/COE 1541 term 2174

Dynamic scheduling and out-of-order execution Make the CPU do the scheduling! how even The CPU has an instruction window that it looks at. This is a sequence of instructions in "correct" order. For dynamic scheduling to work well, you need a large instruction window to detect long dependency chains. What things might make it difficult to have a large window? BRANCHES! Branch prediction becomes even more important! Then the CPU can find dependencies between instructions before deciding when to execute them, rather than during execution like with single-issue or static multiple-issue pipelines. This can actually simplify forwarding, which was becoming a big problem with static multiple-issue! 1/30/2017 CS/COE 1541 term 2174

Detecting dependencies In order to get the best ILP, we have to detect dependencies in a sophisticated way. One way is with list scheduling. The first step is to build a graph of data dependencies. Nodes are instructions, arrows are dependencies, and numbers are how many cycles it takes. Here we have two dependency chains. What is the longest path through these chains, and therefore the minimum number of cycles to execute them? 2 3 1 illustration courtesy of Dr. Melhem total 7 cycles 1/30/2017 CS/COE 1541 term 2174

There is another kind of dependency in this code, too... Red herrings Sometimes limitations on the number of registers create false ordering dependencies, one of which is antidependencies. These are not "real" data dependencies, and must be detected and eliminated for best dynamic scheduling results. add t0, t0, t4 addi s1, t0, 64 lw t0, 0(s0) beq t0, t9, blah add t0, t0, t4 addi s1, t0, 64 lw t0', 0(s0) beq t0', t9, blah rename! Two dependency chains... but t0 holds different values. This is a Write-After-Read (WAR) name dependency. By renaming the registers used, we can now execute these two chains in parallel. There is another kind of dependency in this code, too... 1/30/2017 CS/COE 1541 term 2174

We just saw write-after-read (WAR) dependencies (or antidependencies). RAW WAR? WAW! The "data dependencies" we've talked about before are more properly called read-after-write (RAW) or flow dependencies. We just saw write-after-read (WAR) dependencies (or antidependencies). add t0, t0, t4 addi s1, t0, 64 lw t0, 0(s0) beq t0, t9, blah And there's a third kind: write-after-write (WAW), or output dependencies. We solved RAW dependencies with forwarding. WAR and WAW can be solved with register renaming. This can happen at compile time (if there are enough ISA registers) or dynamically! 1/30/2017 CS/COE 1541 term 2174

Structural hazards return Of course, when trying to schedule multiple instructions to run at once, you have to make decisions based on how many functional units are available. If we have a program that consists entirely of float operations, what is our maximum IPC? Just 1! The compiler can help somewhat, but a large instruction window to work with is very important. It allows us to find good mixes of instructions to keep the CPU busy. Instruction Scheduler Load/ Store Int ALU 1 Int ALU 2 Float ALU 1/30/2017 CS/COE 1541 term 2174

Exceptions/Interrupts 1/30/2017 CS/COE 1541 term 2174

Hey! Listen! An exception (or interrupt) is an event which causes the CPU to stop the normal flow of execution and go somewhere else. There are many possible causes of exceptions: Software exceptions are usually used to call OS routines. Internal exceptions are caused by problems with the program – arithmetic overflow, misaligned memory accesses, /0, etc. External exceptions (or more often called interrupts) are used by other computer hardware to tell the CPU that something has happened – maybe data is ready to read, or something needs new data, or the user hit a key, or... In all cases, the same things have to happen. 1/30/2017 CS/COE 1541 term 2174

Handling exceptions An exception is really a special kind of call. What happens: Information about the exception (what caused it, the PC where it happened, etc) is stored somewhere. The CPU stops doing whatever it was doing. Control transfers to a predetermined location, known as an exception handler. Usually this is inside the OS. The exception handler inspects the exception information and decides what to do (ignore it, perform a system call, kill the program, give the hardware what it needs, etc.) The exception handler returns, and normal operation resumes. 1/30/2017 CS/COE 1541 term 2174

Easy enough... But pipelining and OOO execution throw huge wrenches into it. If an overflow occurs here... what instructions should we flush? And what does the register file look like? I-Mem Ins. Decoder Register File D-Mem ALU 1/30/2017 CS/COE 1541 term 2174

Precise vs. imprecise Figuring out which instructions need to be flushed and which need to be completed before running the handler is a tricky task. So tricky, that some architectures used to give up on it. The exception handler would be given a rough, imprecise estimate of where the exception occurred. This is, obviously, not great. All modern architectures use precise exceptions: the handler is guaranteed that all previous instructions and their effects have completed, and the PC is exactly where the exception occurred. 1/30/2017 CS/COE 1541 term 2174