Pipeline Issues This pipeline stuff makes my head hurt! Maybe it’s that dumb hat.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Instruction-Level Parallelism (ILP)
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Chapter 12 Pipelining Strategies Performance Hazards.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
DLX Instruction Format
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
1 Manchester Mark I, This was the second (the first was a small- scale prototype) machine built at Cambridge. A production version of this computer.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
What are Exception and Interrupts? MIPS terminology Exception: any unexpected change in the internal control flow – Invoking an operating system service.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
6.004 – Fall /29/0L15 – Building a Beta 1 Building the Beta.
Pipelining the Beta bet·ta ('be-t&) n. Any of various species
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
L16 – Pipelining 1 Comp 411 – Spring /20/2011 Pipelining Between 411 problems sets, I haven’t had a minute to do laundry Now that’s what I call dirty.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Computer Organization and Design Pipelining Montek Singh Mon, Dec 2, 2013 Lecture 16.
Computer Organization and Design Pipelining Montek Singh Dec 2, 2015 Lecture 16 (SELF STUDY – not covered on the final exam)
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Chapter Six.
Advanced Architectures
Exceptions Another form of control hazard Could be caused by…
Computer Organization CS224
ARM Organization and Implementation
\course\cpeg323-08F\Topic6b-323
Pipelining: Advanced ILP
Lecture 6: Advanced Pipelines
Pipelining review.
How Computers Work Lecture 13 Details of the Pipelined Beta
\course\cpeg323-05F\Topic6b-323
Pipeline control unit (highly abstracted)
Chapter Six.
Chapter Six.
Control unit extension for data hazards
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
Pipeline Control unit (highly abstracted)
Control Hazards Branches (conditional, unconditional, call-return)
Control unit extension for data hazards
RTL for the SRC pipeline registers
Control unit extension for data hazards
Interrupts and exceptions
Guest Lecturer: Justin Hsia
Pipelining Hazards.
Presentation transcript:

Pipeline Issues This pipeline stuff makes my head hurt! Maybe it’s that dumb hat

Recalling Data Hazards PROBLEM: Subsequent instructions can reference the contents of a register well before the pipeline stage where the register is written. SOLUTION #2: Add special hardware to maintain the sequential execution semantics of the ISA. SOLUTION #1: Deal with it in SOFTWARE; expose the pipeline for all to see.

Bypass Paths But there are some problems that BYPASSING CAN’T FIX! Add special cheat paths, called BYPASSES, that route the results of the ALU and WB stages to the RF stage, thus substituting the register’s old contents with a value that will be written to that register at some point in the future. Detection of these cases has to be incorporated into the decoding logic of the RF stage, which basically looks at the instructions in the ALU and WB stage to see if their destination register matches a source register reference.

Load Hazards The hazard between the XOR and the LD can be addressed by our established bypass paths… Consider LOADS: Can we fix all these problems using bypass paths?

Load Hazard (easy) The XOR operand r4 can simply be bypassed from the output of the memory in the WB stage to the RF stage… by our normal bypass path.

Structural Data Hazard In a 4-stage pipeline, for a LD instruction fetched during clock i, the data from memory isn’t returned from memory until late into cycle i+3. Bypassing can fix the XOR but not ADD! How do we fix this one? The XOR hazard is pretty easy, but…

Load Hazard (hard) The r4 operand to the ADD instruction hasn’t yet been fetched from memory. It exists NOWHERE on our data paths – we can’t solve this problem by bypassing!

Load Delay If the compiler knows about a machine’s load delay, it can often rearrange code sequences to eliminate such hazards. Many compilers provide machine-specific instruction scheduling. Bypassing can’t fix the problem with ADD since the data simply isn’t available! We have to add some pipeline interlock hardware to stall ADD’s execution.

Memory Timing & Pipelining But, but, what about FASTER processors? FACT: Processors have become very fast relative to memories! And this gap continues to grow… Do we just increase the clock period to accommodate this bottleneck component? ALTERNATIVE: Longer pipelines. 1. Add “MEMORY WAIT” stages between START of read operation & return of data. 2. Build pipelined memories, so that multiple (say, N) memory transactions can be in progress at once. These steps add load delay slots; hence 3. Stall pipeline on unbypassable load delays. A 4-Stage pipeline requires READ access in less than one clock. A 5-Stage pipeline would allow nearly two clocks...

5-stage Pipeline Omits some detail NO bypass or interlock logic Address available right after instruction enters Memory pipe stage Data needed right before rising clock edge at end of Write Back pipe stage almost 2 clock cycles

5-stage pipeline Omits some detail NO bypass or interlock logic We wanted a simple, clean pipeline but… added IR IF mux to annul branch-slot instructions added A/B bypass muxes to get data before it’s written to regfile added LE/muxes to freeze IF/RF stage so we can wait for LD to reach WB stage

RF-stage Bypass Details 1, 2, 3, 4… Hum, it looks like we have a few extra inputs on that mux Register File We’ve been a little sloppy about this detail Note: can use “distributed” mux built from tristate drivers.

Bypass Implementation To reduce the amount of bypass logic, the WDSEL mux has been split: choice between ALU and PC+4 is made in ALU stage, choice between ALU/PC and MEMDATA is made is WB stage. Part 2 of the WDSEL mux Part 1 of the WDSEL mux Bypass From WB Bypass From MEM Bypass From ALU

Bypass Logic Beta Bypass logic (need two copies for A/B data): * If instruction is a ST (doesn’t write into regfile), set RC for ALU/MEM/WB to R31 5-bit compare

Linkage Register Write Timing BR Decision Time ADD writes BR writes Can we make XOR’s regfile access work by bypassing?

BR/JMP PC bypass For BR/JMP, Rc value is taken from PC, not ALU. So we have to add bypass paths for PC ALU and PC MEM. PC WB is already taken care of if we bypass WB stage from output of WDSEL mux.

Unused Opcode Traps IDEA: TRAP illegal instructions to a special routine in the Operating System, which can Interpret them in software; or Print humane error report. IMPLEMENTATION: On Bad Opcode (discovered in RF Stage): Select IllOp adr as next PC Annul instruction in IF stage Substitute BNE(r31,0,XP) for bad instruction - will (eventually) store PC+4 into XP... Need bypass paths to make XP usable immediately by code at IllOp!

Illegal Opcode Traps Bad opcode decoded in RF stage: PC ← address of IllOp Handler Annul instruction in IF Force BNE(R31,0,XP) in RF stage → will save PC+4 in XP when it reaches WB stage

Taking Exception… In general, we’d like to annul ALL instructions following one that causes a trap or fault: FREEZE state at time of exception, for inspection by handler code. ILLEGAL INSTRUCTIONS are recognizable in RF stage of pipe; are ALL faults & traps? CONSIDER: ARITHMETIC EXCEPTIONS: divide by zero, etc. Caught by ALU subsystem, during processing of data in ALU stage MEMORY FAULTS: Program reference to illegal memory location... Caught by MEMORY subsystem, during processing of address input in MEM stage

Annulment Logic Fault in ??? stage: PC ← address of fault handler Force BNE(R31,0,XP) in ??? stage → will save PC+4 in XP when it reaches WB stage Annul all following instructions (those earlier in the pipeline): called “flushing the pipe”

Asynchronous I/O Interrupts Suppose key struck, interrupt requested (via IRQ) during the fetch of ADD. Then let’s Select XAdr (handler) as next PC Leave ADD in pipeline; NO annulment! Code handler to return to SUB instruction. Can this work??? Let’s find out… This should be easy. Take, for example, | The interrupted code: Interrupt Taken HERE | The interrupt handler:

Asynchronous Interrupt Timing Interrupt Taken HERE PROBLEM: When does old PC+4 get written to XP??? | interrupt handler Interrupt Occurs PC = Xadr??

Making Interrupts Work Alternative: When taking interrupt, ANNUL instruction in IF stage... BUT instead of changing it to a NOP, change it to BNE(r31,0,XP) This will cause PC+4 of annulled instruction to be written to XP! CODE HANDLER to return to Reg[XP]-4 (since the annulled instruction is never executed) Interrupt taken

“Smart” Interrupt Handler | The interrupted code: Interrupt taken HERE, ADD instruction annulled | The interrupt handler: | Adjust XP so we | return to annulled | instruction (ADD)

5-stage Pipeline: Final Version Can annul instruction at each stage Can force instruction to BNE(R31,0,XP) in each stage → will save PC+4 in XP when it reaches WB Stage Can stall IF and RF stages while waiting for LD result to reach WB Can bypass results from ALU, MEM and WB back to RF

Pipeline Review Simple unpipelined Beta: 1 cycle/instruction long cycle time: mem+regs+alu+mem 2-Stage pipeline: increased throughput (<2x) introduced branch delay slots * Choice of executing or annulling inst. after branch 5-stage pipeline: increased throughput (3x???) branch delay slots delayed register writeback (3 cycles) * Add bypass paths (10) to access correct value memory data available only in WB stage * Introduce NOPs at IRALU, stall IF and RF stages until LD result ready handle RF/ALU/MEM stage exceptions * Save PC+4 in XP (fake a BR) * annul following insts. (those earlier in pipeline) implement interrupts * Throw away IF inst., save PC+4 in XP, fix return extra HW due to pipelining * Registers to hold values between stages * Data bypass muxes in RF stage * Inst. muxes for “rewriting” code to annul or save PC

RISC = Simplicity??? “The P.T. Barnum World’s Tallest Dwarf Competition” World’s Most Complex RISC? VLIWs, Super-Scalars RISCs Pipelines, Bypasses, Annulment, …,... Addressing features, eg index registers Generalization of registers and operand coding Complex instructions, addressing modes Primitive Machines: direct implementations