ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 7 Pipelining – Part 2 Benjamin Lee Electrical and Computer Engineering Duke University www.duke.edu/~bcl15.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining - Hazards.
February 2, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 5 - Pipelining II Krste Asanovic Electrical Engineering and Computer.
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 6 Pipelining – Part 1 Benjamin Lee Electrical and Computer Engineering Duke University
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Pipelining III Steve Ko Computer Sciences and Engineering University at Buffalo.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
Chapter 5 Pipelining and Hazards
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
February 2, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 5 - Pipelining II Krste Asanovic Electrical Engineering and Computer.
Control Hazards.1 Review: Datapath with Data Hazard Control Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register.
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining Krste Asanovic Electrical Engineering and Computer Sciences University of California.
EECC551 - Shaaban #1 Lec # 4 winter Data Hazards Requiring Stall Cycles In some code sequence cases, potential data hazards cannot be handled.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
What are Exception and Interrupts? MIPS terminology Exception: any unexpected change in the internal control flow – Invoking an operating system service.
Interrupts / Exceptions / Faults Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 30, 2012L21-1
February 2, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 5 - Pipelining II (Branches, Exceptions) Krste Asanovic Electrical.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Krste Asanovic Electrical Engineering and Computer Sciences
Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
Constructive Computer Architecture Interrupts/Exceptions/Faults Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Branch Hazards and Static Branch Prediction Techniques
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.
CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
Exceptions Another form of control hazard Could be caused by…
Computer Organization CS224
Stalling delays the entire pipeline
5 Steps of MIPS Datapath Figure A.2, Page A-8
Appendix A - Pipelining
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining
Pipelining: Advanced ILP
Dr. George Michelogiannakis EECS, University of California at Berkeley
The processor: Pipelining and Branching
Lecture 9. MIPS Processor Design – Pipelined Processor Design #2
Electrical and Computer Engineering
CSC 4250 Computer Architectures
Lecture 5. MIPS Processor Design
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
Overview What are pipeline hazards? Types of hazards
CS203 – Advanced Computer Architecture
Control unit extension for data hazards
Control unit extension for data hazards
Interrupts and exceptions
Presentation transcript:

ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 7 Pipelining – Part 2 Benjamin Lee Electrical and Computer Engineering Duke University

ECE 552 / CPS 5502 ECE252 Administrivia 27 September – Homework #2 Due -Use blackboard forum for questions -Attend office hours with questions - for separate meetings 2 October – Class Discussion Roughly one reading per class. Do not wait until the day before! 1.Srinivasan et al. “Optimizing pipelines for power and performance” 2.Mahlke et al. “A comparison of full and partial predicated execution support for ILP processors” 3.Palacharla et al. “Complexity-effective superscalar processors” 4.Yeh et al. “Two-level adaptive training branch prediction”

ECE 552 / CPS 5503 Data Hazards and Scheduling Try producing faster code for -A = B + C; D = E – F; -Assume A, B, C, D, E, and F are in memory -Assume pipelined processor Slow Code LW Rb, b LW Rc, c ADD Ra, Rb, Rc SW a, Ra LW Re e LW Rf, f SUBRd, Re, Rf SW d, RD Fast Code LW Rb, b LW Rc, c LW Re, e ADD Ra, Rb, Rc LWRf, f SWa, Ra SUB Rd, Re, Rf SW d, RD

ECE 552 / CPS 5504 Compiler Scheduling Reduce stalls by moving instructions -Basic pipeline scheduling eliminates back-to-back load-use pairs -What are the limitations of scheduling? Scheduling Scope -Requires an independent instruction to place between load-use pairs -Little scope for scheduling, 1-add, 3-ld/st Slow CodeFast Code ld r2, 4(sp)ld r2, 4(sp) ld r3, 8(sp)ld r3, 8(sp) add r3, r2, r1add r3, r2, r1 st r1, 0(sp)st r1, 0(sp)

ECE 552 / CPS 5505 Compiler Scheduling Number of registers -Registers hold “live” values -Example code contains 7 different values, including sp -Before: max 3 values live  3 registers sufficient -After: max 4 values live  3 registers insufficient -Original code re-uses r1 and r2, re-scheduling causes WAR violations BeforeAfter ld r2, 4 (sp)ld r2, 4 (sp) ld r1, 8 (sp)ldr1, 8 (sp) add r1, r2, r1#stallld r2, 16 (sp) st r1, 0 (sp)add r1, r2, r1#WAR ld r2, 16 (sp)ld r1, 20 (sp) ld r1, 20 (sp)st r1, 0 (sp)#WAR sub r2, r1, r1 #stallsub r2, r1, r1 str1, 12 (sp)str1, 12 (sp)

ECE 552 / CPS 5506 Compiler Scheduling Alias Analysis -Determine whether load/stores reference same memory locations -Determines if loads/stores can be re-ordered -Previous Examples: easy, all loads/stores use the same base register (sp) -New Example: Can compiler tell that r8 == sp? BeforeAfter ld r2, 4 (sp)ld r2, 4 (sp) ldr3, 8 (sp)ldr3, 8 (sp) add r3, r2, r1 #stallld r5, 0 (r8) st r1, 0 (sp) addr3, r2, r1 ldr5, 0 (r8) ld r6, 4 (r8) ld r6, 4 (r8)str1, 0 (sp) subr5, r6, r4#stallsubr5, r6, r4 str4, 8 (r8)str4, 8 (r8)

ECE 552 / CPS 5507 Control Hazards Arises when calculating next program counter (PC) -Pipeline stalls if required values not yet available Jumps -Jump (immediate) requires opcode, offset, current PC -Jump (register) requires opcode, register value Conditional Branches -Requires opcode, current PC, register (for condition), offset Sequential Successor Instructions -Requires opcode, current PC

ECE 552 / CPS 5508 Program Counter Calculations Identify change in control flow during decode Determine whether I 1 changes control flow before fetching I 2 ? No. Time t0t1t2t3t4t5t6t7.... (I 1 ) r1  (r0) + 10IF 1 ID 1 EX 1 MA 1 WB 1 (I 2 ) r3  (r2) + 17IF 2 IF 2 ID 2 EX 2 MA 2 WB 2 (I 3 )IF 3 IF 3 ID 3 EX 3 MA 3 WB 3 (I 4 ) IF 4 IF 4 ID 4 EX 4 MA 4 WB 4 Resource Usage time t0t1t2t3t4t5t6t7.... IFI 1 nopI 2 nopI 3 nopI 4 IDI 1 nopI 2 nopI 3 nopI 4 EX I 1 nopI 2 nopI 3 nopI 4 MA I 1 nopI 2 nopI 3 nopI 4 WB I 1 nopI 2 nopI 3 nopI 4

ECE 552 / CPS 5509 Speculate PC  PC + 4 addrinstr I 1 096ADD I 2 100J 304 I 3 104ADD I 4 304ADD kill A jump instruction kills (not stalls) the following instruction. How? stall I2I2 I1I1 104 IR PC addr inst Inst Memory 0x4 Add nop IR E M Add Jump? PCSrc (pc+4 / jabs / rind/ br)

ECE 552 / CPS Pipelining Jumps addrinstr I 1 096ADD I 2 100J 304 I 3 104ADD I 4 304ADD kill To kill a fetched instruction, add mux before IR to insert “nops” case(opcodeD) J, JAL: IR  nop otherwise:IR  inst I2I2 I1I1 104 stall IR PC addr inst Inst Memory 0x4 Add nop IR E M Add Jump? PCSrc (pc+4 / jabs / rind/ br) nop IRSrc D I2I2 I1I1 304 nop

ECE 552 / CPS Pipelining Jumps time t0t1t2t3t4t5t6t7.... IFI 1 I 2 I 3 I 4 I 5 IDI 1 I 2 nop I 4 I 5 EX I 1 I 2 nop I 4 I 5 MA I 1 I 2 nop I 4 I 5 WB I 1 I 2 nop I 4 I 5 time t0t1t2t3t4t5t6t7.... (I 1 ) 096: ADDIF 1 ID 1 EX 1 MA 1 WB 1 (I 2 ) 100: J 304IF 2 ID 2 EX 2 MA 2 WB 2 (I 3 ) 104: ADDIF 3 nop nop nop nop (I 4 ) 304: ADD IF 4 ID 4 EX 4 MA 4 WB 4 Resource Usage

ECE 552 / CPS Pipelining Conditional Branches addrinstr I 1 096ADD I 2 100BEQZ r1 200 I 3 104ADD I 4 304ADD Branch condition computed in execute stage. What should be done in decode stage? Jump? I2I2 I1I1 104 stall IR PC addr inst Inst Memory 0x4 Add nop IR E M Add PCSrc (pc+4 / jabs / rind / br) nop IRSrc D A Y ALU zero?

ECE 552 / CPS Pipelining Conditional Branches addrinstr I 1 096ADD I 2 100BEQZ r1 200 I 3 104ADD I 4 304ADD If branch is taken, kill two following instructions. And because instruction in decode stage is invalid, update stall signal. stall IR PC addr inst Inst Memory 0x4 Add nop IR E M PCSrc (pc+4/jabs/rind/br) nop A Y ALU zero? I2I2 I1I1 108 I3I3 BEQZ? Jump? IRSrc D IRSrc E Add PC

ECE 552 / CPS Update Stall Signal Stall  > & !((opcodeE == BEQZ) & zero? # branch condition true + (opcodeE == BNEZ) & !zero?# branch condition true ) Do not stall if branch is taken. Why? Instruction in the decode stage is invalid. Kill instruction instead.

ECE 552 / CPS Pipelining Conditional Branches addrinstr I 1 096ADD I 2 100BEQZ r1 200 I 3 104ADD I 4 304ADD If branch is taken, kill two following instructions. And because instruction in decode stage is invalid, update stall signal. stall IR PC addr inst Inst Memory 0x4 Add nop IR E M PCSrc (pc+4/jabs/rind/br) nop A Y ALU zero? I2I2 I1I1 108 I3I3 BEQZ? Jump? IRSrc D IRSrc E Add PC

ECE 552 / CPS Derive PCSrc Signal Derive mux control signal for PCSrc. if( (opcodeE == BEQZ & z) + (opcodeE == BNEZ & !z) ), PCSrc  br else if ((opcodeD == J) + (opcodeD == JAL)), PCSrc  jabs else if ((opcodeD == JR) + (opcodeD == JALR)), PCSrc  rind otherwise, PCSrc  PC + 4 Derive mux control signal for IRSrcD. if( (opcodeE == BEQZ & z) + (opcodeE == BNEZ & !z) ), ICSrcD  nop else if( (opcodeD==J) + (opcodeD==JAL) + (opcodeD==JAR) + (opcodeD==JALR) ), ICSrcD  nop otherwise, ICSrcD  Instr Derive mux control signal for IRSrcE. if( (opcodeE == BEQZ & z) + (opcodeE == BNEZ & !z) ), ICSrcE  nop otherwise, IRSrcE  (stall & nop) + (!stall & IRD)

ECE 552 / CPS Pipelining Conditional Branches addrinstr I 1 096ADD I 2 100BEQZ r1 200 I 3 104ADD I 4 304ADD If branch is taken, kill two following instructions. And because instruction in decode stage is invalid, update stall signal. stall IR PC addr inst Inst Memory 0x4 Add nop IR E M PCSrc (pc+4/jabs/rind/br) nop A Y ALU zero? I2I2 I1I1 108 I3I3 BEQZ? Jump? IRSrc D IRSrc E Add PC

ECE 552 / CPS Derive IRSrcD Signal Derive mux control signal for PCSrc. if( (opcodeE == BEQZ & z) + (opcodeE == BNEZ & !z) ), PCSrc  br else if ((opcodeD == J) + (opcodeD == JAL)), PCSrc  jabs else if ((opcodeD == JR) + (opcodeD == JALR)), PCSrc  rind otherwise, PCSrc  PC + 4 Derive mux control signal for IRSrcD. if( (opcodeE == BEQZ & z) + (opcodeE == BNEZ & !z) ), ICSrcD  nop else if( (opcodeD==J) + (opcodeD==JAL) + (opcodeD==JAR) + (opcodeD==JALR) ), ICSrcD  nop otherwise, ICSrcD  Instr Derive mux control signal for IRSrcE. if( (opcodeE == BEQZ & z) + (opcodeE == BNEZ & !z) ), ICSrcE  nop otherwise, IRSrcE  (stall & nop) + (!stall & IRD)

ECE 552 / CPS Pipelining Conditional Branches addrinstr I 1 096ADD I 2 100BEQZ r1 200 I 3 104ADD I 4 304ADD If branch is taken, kill two following instructions. And because instruction in decode stage is invalid, update stall signal. stall IR PC addr inst Inst Memory 0x4 Add nop IR E M PCSrc (pc+4/jabs/rind/br) nop A Y ALU zero? I2I2 I1I1 108 I3I3 BEQZ? Jump? IRSrc D IRSrc E Add PC

ECE 552 / CPS Derive IRSrcE Signal Derive mux control signal for PCSrc. if( (opcodeE == BEQZ & z) + (opcodeE == BNEZ & !z) ), PCSrc  br else if ((opcodeD == J) + (opcodeD == JAL)), PCSrc  jabs else if ((opcodeD == JR) + (opcodeD == JALR)), PCSrc  rind otherwise, PCSrc  PC + 4 Derive mux control signal for IRSrcD. if( (opcodeE == BEQZ & z) + (opcodeE == BNEZ & !z) ), IRSrcD  nop else if( (opcodeD==J) + (opcodeD==JAL) + (opcodeD==JAR) + (opcodeD==JALR) ), IRSrcD  nop otherwise, IRSrcD  Instr Derive mux control signal for IRSrcE. if( (opcodeE == BEQZ & z) + (opcodeE == BNEZ & !z) ), IRSrcE  nop otherwise, IRSrcE  (stall & nop) + (!stall & IRD)

ECE 552 / CPS Pipelining Branches time t0t1t2t3t4t5t6t7.... IFI 1 I 2 I 3 I 4 I 5 IDI 1 I 2 I 3 nop I 5 EX I 1 I 2 nop nop I 5 MA I 1 I 2 nop nop I 5 WB I 1 I 2 nop nop I 5 time t0t1t2t3t4t5t6t7.... (I 1 ) 096: ADDIF 1 ID 1 EX 1 MA 1 WB 1 (I 2 ) 100: BEQZ 200IF 2 ID 2 EX 2 MA 2 WB 2 (I 3 ) 104: ADDIF 3 ID 3 nop nop nop (I 4 ) 108: IF 4 nop nop nop nop (I 5 ) 304: ADD IF 5 ID 5 EX 5 MA 5 WB 5 Resource Usage

ECE 552 / CPS Resolving Branch Conditions beq r1, r3, 36 and r2, r3, r5 or r6, r1, r7 add r8, r1, r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMem Ifetch Reg ALU DMemIfetch Reg Three stage stall. What about the 3 instructions in between? Attempt to reduce. Stall/Kill signals in decode stage, reduces stalls from 3 to 2 stages.

ECE 552 / CPS Solution 1: Resolve Earlier Large performance impact -Suppose CPI = 1, 30% branch -If branch stalls for 2 cycles, new CPI is 1.6 Solution – Branch Computation -Determine whether branch is taken or not earlier in pipeline (e.g,. beq) -Compute target branch address earlier (e.g., PC addition) Solution – MIPS -MIPS branch tests if a register is equal to zero (e.g., beq) -Move zero test to ID/RF stage -Introduce adder to calculate new PC in ID/RF stage -With early branch resolution and kill/stall signals in decode, branches stall for 1 cycle

ECE 552 / CPS Pipelined MIPS Datapath Add sufficient logic in decode stage to generate “zero?” signal Branch is resolved in ID stage instead of EX stage, eliminating one stall cycle

ECE 552 / CPS Solution 2: Predict Condition Stall until branch direction is clear Predict Branch Not Taken -Execute successor instructions in sequence -“Squash” instructions in pipeline if branch taken -Advantage: 47% of MIPS branches not taken -Advantage: PC+4 already calculated for instruction fetch Predict Branch Taken -Advantage: 53% of MIPS branches taken -Disadvantage: Target address not yet calculated, 1-cycle penalty More sophisticated branch prediction later…

ECE 552 / CPS Solution 3: Change ISA Semantics Delayed Branch -Change ISA semantics: Instruction after jump/branch always executed. -Define branch to take place after a following instruction -Gives compiler flexibility to schedule useful instructions into a branch- induced stall -Branch delay of length n Branch instruction Sequential successor 1 Sequential successor 2 … Sequential successor n Branch target if taken -MIPS uses n=1 delay slot to calculate branch outcome, target address

ECE 552 / CPS Scheduling Delay Slots (a) Fills delay slot and reduces instruction count, (b) DSUB needs copying and increases instruction count, (c) OR executes if branch fails so issue speculatively

ECE 552 / CPS Delayed Branches Compiler Effectiveness (n=1 branch delay slot) -Fill about 60% of branch delay slots -About 80% of instructions executed are useful computation Disadvantages of Delayed Branches -As pipelines deepen, branch delay grows and requires more slots -Less popular than dynamic approaches (e.g., branch prediction)

ECE 552 / CPS Pipelining in Practice Why is IPC < 1? Full forwarding may be too expensive to implement -Implement only frequently used forwarding paths -Implementing infrequently used forwarding paths might impact length of pipeline stage, increase clock period, and reduce IPC forwarding gains Multi-cycle Instructions (e.g., loads) -Instruction following a multi-cycle instruction cannot use its results -MIPS-I defined load-delay slots, a software-visible pipeline hazard. -Rely on compiler to schedule useful instructions, nops Conditional Branches -Without delay slots, kill following instructions -With delay slots, rely on compiler to schedule useful instructions, nops

ECE 552 / CPS Optimal Pipeline Depth -Srinivasan et al. “Optimizing pipelines for power and performance,” Performance (BIPS) versus Power (W) -FO4 is measure of delay: Delay of inverter that is driven by inverter 4x smaller and that is driving inverter 4x larger. -Quantify amount of logic per pipeline stage in FO4 delays -(shorter delays  deeper pipelines)

ECE 552 / CPS Interrupts and Exceptions Interrupts alter normal control flow -Event that needs to be processed by another (system) program -Event is considered unexpected or rare from program’s perspective I i-1 HI 1 HI 2 HI n IiIi I i+1 program interrupt handler

ECE 552 / CPS Causes of Interrupts Interrupt -An event that requests the attention of the processor Asynchronous Interrupt – External Event -Input/output device service-request -Timer expiration -Power disruptions, hardware failure Synchronous Interrupt – Internal Event -Undefined opcode, privileged instruction -Arithmetic overflow, FPU exception -Misaligned memory access -Virtual memory exceptions – page faults, TLB misses, protection violations -Traps – system calls, jumps into kernel -Also known as exceptions

ECE 552 / CPS Asynchronous Interrupts Service Request -An I/O device requests attention by asserting one of the prioritized interrupt request lines Invoking Interrupt Handler -Processor decides to process interrupt -Stops current program at instruction j, completing all instructions up to j-1. Defines a precise interrupt. -Saves PC of instruction j in a special register (e.g., EPC) -Disables interrupts and transfers control to designated interrupt handler running in kernel mode.

ECE 552 / CPS Interrupt Handler PC Processing -Save EPC before re-enabling interrupts, thereby allowing nested interrupts -Need an instruction to move EPC into general-purpose registers -Need a way to mask further interrupts until EPC saved Status Register -Read status register to determine cause of interrupt -Executes handler code Exiting Interrupt Handler -Use special indirect jump instruction RFE (return-from-exception) -Enables interrupts -Restores processor to user mode -Restores hardware status and control state

ECE 552 / CPS Synchronous Interrupts Exceptions -A synchronous interrupt (exception) is caused by a particular instruction Instruction Re-start -Generally, instruction cannot be completed without handler -Instruction needs re-start after exception has been handled -Processor must undo the effect of partially executed instructions System Calls -If the interrupt arises from a system calls (traps), trapping instruction considered complete -System calls require a special jump instruction and changing into privileged kernel mode

ECE 552 / CPS Pipelining and Interrupt Handling Synchronous: How does the processor handle multiple, simultaneous exceptions in different pipeline stages? Asynchronous: How does the processor handle external interrupts? PC Inst. Mem D Decode EM Data Mem W + Illegal Opcode Overflow Data address Exceptions PC address Exception Asynchronous Interrupts

ECE 552 / CPS Pipelining and Interrupt Handling PC Inst. Mem D Decode EM Data Mem W + Illegal Opcode Overflow Data address Exceptions PC address Exception Asynchronous Interrupts Exc D PC D Exc E PC E Exc M PC M Cause EPC Kill D Stage Kill F Stage Kill E Stage Select Handler PC Kill Writebac k Commit Point

ECE 552 / CPS Pipelining and Interrupt Handling Propagate exception flags through pipeline until commit point Internal Interrupts -An instruction might generate multiple exception flags -For a given instruction, exceptions in earlier pipe stages over-ride those in later pipe stages, thereby prioritizing exceptions earlier in time Inject external interrupts at commit point -External interrupts over-ride internal interrupts Check exception flags at commit point -If exception flagged, update cause and EPC register -Kill instructions in all pipeline stages -Inject handler PC into fetch stage

ECE 552 / CPS Speculating about Exceptions Predict -Exceptions are rare. Predicting that no exceptions occurred is accurate. Check Prediction -Exceptions detected at end of pipeline (commit point). Invoke special hardware for various exception types Recovery Mechanism -Architectural state modified at end of pipeline (commit point). -Discard partially executed instructions after an exception -Launch exception handler after flushing pipeline

ECE 552 / CPS Pipelining and Exceptions time t0t1t2t3t4t5t6t7.... (I 1 ) 096: ADDIF 1 ID 1 EX 1 MA 1 nop overflow! (I 2 ) 100: XORIF 2 ID 2 EX 2 nop nop (I 3 ) 104: SUBIF 3 ID 3 nop nop nop (I 4 ) 108: ADD IF 4 nop nop nopnop (I 5 ) Exc. Handler code IF 5 ID 5 EX 5 MA 5 WB 5 Resource Usage time t0t1t2t3t4t5t6t7.... IFI 1 I 2 I 3 I 4 I 5 IDI 1 I 2 I 3 nop I 5 EX I 1 I 2 nop nop I 5 MA I 1 nop nop nop I 5 WB nop nop nop nop I 5

ECE 552 / CPS Acknowledgements These slides contain material developed and copyright by -Arvind (MIT) -Krste Asanovic (MIT/UCB) -Joel Emer (Intel/MIT) -James Hoe (CMU) -John Kubiatowicz (UCB) -Alvin Lebeck (Duke) -David Patterson (UCB) -Daniel Sorin (Duke)