Pipelining – Out-of-order execution and exceptions

Pipelining – Out-of-order execution and exceptions
CS/COE 1541 (term 2174) Jarrett Billingsley

Class Announcements I suck.
Higher-priority tasks, like making exams and lecture materials, have pre-empted the grading process. Hopefully with the quiz/homework answers you can have an idea of what your grade would be like, or if you have any questions. I have nothing else to do this coming weekend except grading. oh and my birthday I guess  Please stay safe. Keep a vigilant eye on those in power, especially those who ignore checks and balances. If you are a noncitizen, please research your rights. Things may be getting scary. Value your safety over your degree. 1/30/2017 CS/COE 1541 term 2174

From static to dynamic 1/30/2017 CS/COE 1541 term 2174

Very Long Instruction Word (VLIW)
In an extreme case of static multiple-issue, VLIW architectures pack multiple smaller instructions into large "super-instructions." This is done by the compiler! Then the CPU blindly fetches and executes these blocks of instructions without needing to check for dependencies. This allows superscalar performance (multiple instructions per cycle) but without as much hardware overhead. Despite this, it hasn't taken off as well as its proponents hoped. Momentum is probably one reason. It always is! Static scheduling has shortcomings. SIMD instructions and extremely parallel CPUs (GPUs) have greatly increased computing throughput of traditional designs. 1/30/2017 CS/COE 1541 term 2174

VLIW would encode these pairs of instructions as super-instructions.
The crux of the issue The essential problem multiple-issue architectures (of any kind) try to address is that there exists instruction-level parallelism (ILP). lw ... add ... sub ... mul ... sw ... In this code, there are two dependency chains: sequences of instructions which depend on previous instructions but not other chains. lw ... add ... sub ... sw ... mul ... VLIW would encode these pairs of instructions as super-instructions. But there's another way... 1/30/2017 CS/COE 1541 term 2174

Dynamic scheduling and out-of-order execution
Make the CPU do the scheduling! how even The CPU has an instruction window that it looks at. This is a sequence of instructions in "correct" order. For dynamic scheduling to work well, you need a large instruction window to detect long dependency chains. What things might make it difficult to have a large window? BRANCHES! Branch prediction becomes even more important! Then the CPU can find dependencies between instructions before deciding when to execute them, rather than during execution like with single-issue or static multiple-issue pipelines. This can actually simplify forwarding, which was becoming a big problem with static multiple-issue! 1/30/2017 CS/COE 1541 term 2174

Detecting dependencies
In order to get the best ILP, we have to detect dependencies in a sophisticated way. One way is with list scheduling. The first step is to build a graph of data dependencies. Nodes are instructions, arrows are dependencies, and numbers are how many cycles it takes. Here we have two dependency chains. What is the longest path through these chains, and therefore the minimum number of cycles to execute them? 2 3 1 illustration courtesy of Dr. Melhem total 7 cycles 1/30/2017 CS/COE 1541 term 2174

There is another kind of dependency in this code, too...
Red herrings Sometimes limitations on the number of registers create false ordering dependencies, one of which is antidependencies. These are not "real" data dependencies, and must be detected and eliminated for best dynamic scheduling results. add t0, t0, t4 addi s1, t0, 64 lw t0, 0(s0) beq t0, t9, blah add t0, t0, t4 addi s1, t0, 64 lw t0', 0(s0) beq t0', t9, blah rename! Two dependency chains... but t0 holds different values. This is a Write-After-Read (WAR) name dependency. By renaming the registers used, we can now execute these two chains in parallel. There is another kind of dependency in this code, too... 1/30/2017 CS/COE 1541 term 2174

We just saw write-after-read (WAR) dependencies (or antidependencies).
RAW WAR? WAW! The "data dependencies" we've talked about before are more properly called read-after-write (RAW) or flow dependencies. We just saw write-after-read (WAR) dependencies (or antidependencies). add t0, t0, t4 addi s1, t0, 64 lw t0, 0(s0) beq t0, t9, blah And there's a third kind: write-after-write (WAW), or output dependencies. We solved RAW dependencies with forwarding. WAR and WAW can be solved with register renaming. This can happen at compile time (if there are enough ISA registers) or dynamically! 1/30/2017 CS/COE 1541 term 2174

Structural hazards return
Of course, when trying to schedule multiple instructions to run at once, you have to make decisions based on how many functional units are available. If we have a program that consists entirely of float operations, what is our maximum IPC? Just 1! The compiler can help somewhat, but a large instruction window to work with is very important. It allows us to find good mixes of instructions to keep the CPU busy. Instruction Scheduler Load/ Store Int ALU 1 Int ALU 2 Float ALU 1/30/2017 CS/COE 1541 term 2174

Exceptions/Interrupts
1/30/2017 CS/COE 1541 term 2174

Hey! Listen! An exception (or interrupt) is an event which causes the CPU to stop the normal flow of execution and go somewhere else. There are many possible causes of exceptions: Software exceptions are usually used to call OS routines. Internal exceptions are caused by problems with the program – arithmetic overflow, misaligned memory accesses, /0, etc. External exceptions (or more often called interrupts) are used by other computer hardware to tell the CPU that something has happened – maybe data is ready to read, or something needs new data, or the user hit a key, or... In all cases, the same things have to happen. 1/30/2017 CS/COE 1541 term 2174

Handling exceptions An exception is really a special kind of call. What happens: Information about the exception (what caused it, the PC where it happened, etc) is stored somewhere. The CPU stops doing whatever it was doing. Control transfers to a predetermined location, known as an exception handler. Usually this is inside the OS. The exception handler inspects the exception information and decides what to do (ignore it, perform a system call, kill the program, give the hardware what it needs, etc.) The exception handler returns, and normal operation resumes. 1/30/2017 CS/COE 1541 term 2174

Easy enough... But pipelining and OOO execution throw huge wrenches into it. If an overflow occurs here... what instructions should we flush? And what does the register file look like? I-Mem Ins. Decoder Register File D-Mem ALU 1/30/2017 CS/COE 1541 term 2174

Precise vs. imprecise Figuring out which instructions need to be flushed and which need to be completed before running the handler is a tricky task. So tricky, that some architectures used to give up on it. The exception handler would be given a rough, imprecise estimate of where the exception occurred. This is, obviously, not great. All modern architectures use precise exceptions: the handler is guaranteed that all previous instructions and their effects have completed, and the PC is exactly where the exception occurred. 1/30/2017 CS/COE 1541 term 2174

Pipelining – Out-of-order execution and exceptions

Similar presentations

Presentation on theme: "Pipelining – Out-of-order execution and exceptions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pipelining – Out-of-order execution and exceptions

Similar presentations

Presentation on theme: "Pipelining – Out-of-order execution and exceptions"— Presentation transcript:

Similar presentations

About project

Feedback