EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer Systems Lecture 3 Pipelining Part II
EENG449b/Savvides Lec 4.2 1/22/04 Announcements Project groups and group meetings Project topics –A 1-page project proposal due next Friday, Jan 30 ( it to me) Project proposal should include: –1 paragraph project overview. This describes what your project will do. –1 paragraph describing the specific tasks that you need to do »E.g read papers, install tools, learn some special programming language or hardware –1 paragraph on what resources you need for your project »E.g Are you using any special hardware? »Do you have access to lab/hardware/software
EENG449b/Savvides Lec 4.3 1/22/04 Instruction Formats Review
EENG449b/Savvides Lec 4.4 1/22/04 Implementing a MIPS Pipeline We are developing a subset of the MIPS pipeline supporting –Load store word –Branch equal zero –Integer ALU Operations Remember MIPS has register-register ALU instructions (e.g Add R1, R2, R3) Attention: In the homework you will have to redesign the pipeline for register-memory instructions for ALU operations (e.g Add R1,R2,(R3)!!!
EENG449b/Savvides Lec 4.5 1/22/04 MIPS Datapath Review
EENG449b/Savvides Lec 4.6 1/22/04 MIPS Datapath Review
EENG449b/Savvides Lec 4.7 1/22/04 MIPS Datapath Review
EENG449b/Savvides Lec 4.8 1/22/04 MIPS Datapath Review
EENG449b/Savvides Lec 4.9 1/22/04 MIPS Basic Pipeline Data needs to be written in the registers at the end of each cycle Depend on instruction type Load or ALU operation LMD ALUOut
EENG449b/Savvides Lec /22/04 Events at every pipe stage
EENG449b/Savvides Lec /22/04 Events at every pipe stage
EENG449b/Savvides Lec /22/04 Hazards Review From previous lecture we know the situations that would cause incorrect execution Structural Hazards - Data Hazards - Control Hazards -
EENG449b/Savvides Lec /22/04 Read After Write (RAW) Instr J tries to read operand before Instr I writes it Caused by a “Data Dependence” (in compiler nomenclature). This hazard results from an actual need for communication. Three Generic Data Hazards I: add r1,r2,r3 J: sub r4,r1,r3
EENG449b/Savvides Lec /22/04 Write After Read (WAR) Instr J writes operand before Instr I reads it Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”. Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Reads are always in stage 2, and – Writes are always in stage 5 I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Three Generic Data Hazards
EENG449b/Savvides Lec /22/04 Three Generic Data Hazards Write After Write (WAW) Instr J writes operand before Instr I writes it. Called an “output dependence” by compiler writers This also results from the reuse of name “r1”. Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Writes are always in stage 5 Will see WAR and WAW in later more complicated pipes I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7
EENG449b/Savvides Lec /22/04 MIPS Basic Pipeline Instruction issued IF ID EX IFWB Data Hazards can be detected here
EENG449b/Savvides Lec /22/04 Hardware Hazard Detection Figure A.20
EENG449b/Savvides Lec /22/04 Logic to Detect Load Interlocks Figure A.21
EENG449b/Savvides Lec /22/04 Forwarding of Results to the ALU Mem output ALU output
EENG449b/Savvides Lec /22/04 Control Hazards Revisited A branch causes a 3-cycle stall in the 5-stage pipeline Branch InstructionIF ID EX MEM WB Branch Successor+1 IF stall stall IF ID EX MEM WB Branch Successor+2 IF ID EX MEM WB Branch Successor+3 IF ID EX MEM WB Higher overhead than data hazards… Can HW changes improve that? YES! Try to make an early decision whether a branch is taken or not.
EENG449b/Savvides Lec /22/04 Improved Pipeline – Dealing with Branches Additional adder in ID stage Write the PC faster Can detect branch hazard 2 cycles earlier
EENG449b/Savvides Lec /22/04 Improved Pipeline – Dealing with Branches Additional adder in ID stage Write the PC faster Note change of order in text! Figure A.11 says a branch hazard would stall for 1 cycle. This is after the optimization in Figure A.24!!! Note the change of order…
EENG449b/Savvides Lec /22/04 Reducing Branch Penalties 1.Freeze the pipeline until the outcome of a branch instruction is known 2.Treat every branch as always not-taken You have to be careful on how to restore the state of the pipeline back the correct place 3.Treat every branch as taken May make sense for some machines where the branch target address is known before the outcome this might make sense 4.Delayed branch Execute some instructions until the outcome is known (branch-delay slots)
EENG449b/Savvides Lec /22/04 Branch-Delay Slots On a machine that needs n cycles before a branch outcome is known: branch instruction sequencial successor 1 compiler needs to decide sequencial successor 2 on valid and useful successors …………………………………… sequencial successor n Typically most processors have 1 delay slot Limitations of branch delay: Restrictions on branch delay instructions Ability to predict branch outcome at compile time –Most hardware support nullifying branch – gives the compiler more flexibility. It can schedule the instruction and later on cancel its effects without violating program correctness
EENG449b/Savvides Lec /22/04 Delayed Branch Where to get instructions to fill branch delay slot? –Before branch instruction –From the target address: only valuable when branch taken –From fall through: only valuable when branch not taken –Canceling branches allow more slots to be filled Compiler effectiveness for single branch delay slot: –Fills about 60% of branch delay slots –About 80% of instructions executed in branch delay slots useful in computation –About 50% (60% x 80%) of slots usefully filled Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar)
EENG449b/Savvides Lec /22/04 Scheduling Branch Delay Independent instruction Cannot be used Preferred when branch taken w/ high prob
EENG449b/Savvides Lec /22/04 Performance of Branch Schemes Assuming an ideal CPI of 1:
EENG449b/Savvides Lec /22/04 Challenges in Pipeline Implementation Exceptions: Situations that can disrupt the in-order execution of instructions (interrupt, fault, exception) I/O device request Invoking an OS service from a user program Breakpoint Integer arithmetic overflow or FP arithmetic anomaly Page fault (not in main memory) Misaligned memory access etc…
EENG449b/Savvides Lec /22/04 Exceptions Requirements Synchronous vs. Asynchronous User requested vs. coerced User maskable vs. user non-maskable With vs. between instructions Resume vs. terminate Major challenges: Exceptions happening within instructions Exceptions that need to be restarted – as in the case of a page fault
EENG449b/Savvides Lec /22/04 MIPS Exceptions Pipeline StateProblem Exceptions IF Page fault on instruction fetch misaligned memory access memory protection violation ID Undefined or illegal opcode EX Arithmetic exception MEM Page fault on data fetch; misaligned memory access; memory protection violation WB None
EENG449b/Savvides Lec /22/04 What’s next? Next lecture: –MIPS FP Pipeline & Dynamic Scheduled Pipelines –An embedded processor architecture: ARM Lecture 6: –Sensor networks and applications –The connection between architecture and networks