Morgan Kaufmann Publishers The Processor

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original.
Pipeline Hazards CS365 Lecture 10. D. Barbara Pipeline Hazards CS465 2 Review  Pipelined CPU  Overlapped execution of multiple instructions  Each on.
ECE 445 – Computer Organization
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Lecture 28: Chapter 4 Today’s topic –Data Hazards –Forwarding 1.
Control Hazards.1 Review: Datapath with Data Hazard Control Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register.
What are Exception and Interrupts? MIPS terminology Exception: any unexpected change in the internal control flow – Invoking an operating system service.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Chapter 4B: The Processor, Part B. Review: Why Pipeline? For Performance! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3.
Pipelined Datapath and Control
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.
CSIE30300 Computer Architecture Unit 06: Containing Control Hazards
PROCESSOR PIPELINING YASSER MOHAMMAD. SINGLE DATAPATH DESIGN.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Exceptions and Interrupts “Unexpected” events requiring change in flow of control – Different ISAs use the terms differently Exception – Arises within.
Computer Organization CS224
Handling Exceptions In MIPS, exceptions managed by a System Control Coprocessor (CP0) Save PC of offending (or interrupted) instruction In MIPS: Exception.
Stalling delays the entire pipeline
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Chapter 4 The Processor Part 4
ECS 154B Computer Architecture II Spring 2009
Morgan Kaufmann Publishers The Processor
Pipelining: Advanced ILP
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
The processor: Exceptions and Interrupts
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Pipelining review.
Pipelining Chapter 6.
The processor: Pipelining and Branching
Morgan Kaufmann Publishers Enhancing Performance with Pipelining
Computer Organization CS224
Lecture 9. MIPS Processor Design – Pipelined Processor Design #2
Pipelining in more detail
Lecture 5. MIPS Processor Design
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
CSC3050 – Computer Architecture
ECE 445 – Computer Organization
Pipelining (II).
Control unit extension for data hazards
Morgan Kaufmann Publishers The Processor
Wackiness Algorithm A: Algorithm B:
Control unit extension for data hazards
Pipelined Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Systems Architecture II
Presentation transcript:

Morgan Kaufmann Publishers The Processor 14 September, 2018 Chapter 4 The Processor Chapter 4 — The Processor

Revised Forwarding Condition Morgan Kaufmann Publishers 14 September, 2018 Revised Forwarding Condition MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 31) and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 31) and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRn1)) and (MEM/WB.RegisterRd = ID/EX.RegisterRn1)) ForwardA = 01 and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRm2)) and (MEM/WB.RegisterRd = ID/EX.RegisterRm2)) ForwardB = 01 Chapter 4 — The Processor

Datapath with Forwarding Morgan Kaufmann Publishers Datapath with Forwarding 14 September, 2018 The signed-immediate input to the ALU, needed by loads and stores, is missing from the datapath Chapter 4 — The Processor

Datapath with Forwarding Morgan Kaufmann Publishers Datapath with Forwarding 14 September, 2018 Multiplexor chooses between the ForwardB multiplexor output and the signed immediate Chapter 4 — The Processor

Load-Use Hazard Detection Morgan Kaufmann Publishers 14 September, 2018 Load-Use Hazard Detection Checking for Load instruction is decoded in ID stage ALU operand register numbers in ID stage are given by IF/ID.RegisterRn1, IF/ID.RegisterRm2 Load-use hazard IF (ID/EX.MemRead and ((ID/EX.RegisterRd = IF/ID.RegisterRn1) or (ID/EX.RegisterRd = IF/ID.RegisterRm1))) stall the pipeline If the instruction in the ID stage is stalled, then the instruction in the IF stage must also be stalled; otherwise, we would lose the fetched instruction. prevent the PC register and the IF/ID pipeline register from changing. Chapter 4 — The Processor

How to Stall the Pipeline Morgan Kaufmann Publishers 14 September, 2018 How to Stall the Pipeline Deasserting all eight control signals (setting them to 0) in the EX, MEM, and WB stages will create a “do nothing” or nop instruction. By identifying the hazard in the ID stage, we can insert a bubble into the pipeline by changing the EX, MEM, and WB control fields of the ID/EX pipeline register to 0. Prevent update of PC and IF/ID register Using instruction is decoded again Following instruction is fetched again 1-cycle stall allows MEM to read data for LDUI Can subsequently forward to EX stage Chapter 4 — The Processor

Morgan Kaufmann Publishers Load-Use Data Hazard 14 September, 2018 Stall inserted here Chapter 4 — The Processor

Datapath with Hazard Detection Morgan Kaufmann Publishers Datapath with Hazard Detection 14 September, 2018 Chapter 4 — The Processor

Stalls and Performance Morgan Kaufmann Publishers Stalls and Performance 14 September, 2018 The BIG Picture Stalls reduce performance But are required to get correct results Compiler can arrange code to avoid hazards and stalls Requires knowledge of the pipeline structure Chapter 4 — The Processor

Morgan Kaufmann Publishers Branch Hazards 14 September, 2018 §4.8 Control Hazards If branch outcome determined in MEM, predict branch not taken Flush these instructions (Set control values to 0) PC Chapter 4 — The Processor

Morgan Kaufmann Publishers Reducing Branch Delay 14 September, 2018 Move the conditional branch execution earlier in the pipeline, then fewer instructions need be flushed. It requires two actions to occur earlier: computing the branch target address and evaluating the branch decision Move hardware from EX stage to determine outcome to ID stage Target address adder Register comparator to see if it is zero This will require additional forwarding and hazard detection hardware we will need to forward results to the zero test logic that operates during ID. To flush instructions in the IF stage, add a control line, called IF.Flush, that zeros the instruction field of the IF/ID pipeline register. Clearing the register transforms the fetched instruction into a nop. Chapter 4 — The Processor

Morgan Kaufmann Publishers 14 September, 2018 Reducing Branch Delay Example: branch taken, assuming the pipeline is optimized for branches that are not taken, and that we moved the branch execution to the ID stage: 36: SUB X10, X4, X8 40: CBZ X1, X3, 8 // PC-relative branch to 40+8*4=72 44: AND X12, X2, X5 48: ORR X13, X2, X6 52: ADD X14, X4, X2 56: SUB X15, X6, X7 ... 72: LDUR X4, [X7,#50] Chapter 4 — The Processor

Morgan Kaufmann Publishers Example: Branch Taken 14 September, 2018 Chapter 4 — The Processor

Morgan Kaufmann Publishers Example: Branch Taken 14 September, 2018 Chapter 4 — The Processor

Dynamic Branch Prediction Morgan Kaufmann Publishers 14 September, 2018 Dynamic Branch Prediction In deeper and superscalar pipelines, branch penalty is more significant Use dynamic prediction Branch prediction buffer (aka branch history table) Indexed by recent branch instruction addresses Stores outcome (taken/not taken) To execute a branch Check table, expect the same outcome Start fetching from fall-through or target If wrong, flush pipeline and flip prediction Chapter 4 — The Processor

1-Bit Predictor: Shortcoming Morgan Kaufmann Publishers 1-Bit Predictor: Shortcoming 14 September, 2018 Inner loop branches mispredicted twice! outer: … … inner: … … CBZ …, …, inner … CBZ …, …, outer Mispredict as taken on last iteration of inner loop Then mispredict as not taken on first iteration of inner loop next time around Chapter 4 — The Processor

Morgan Kaufmann Publishers 14 September, 2018 2-Bit Predictor Only change prediction on two successive mispredictions Chapter 4 — The Processor

Calculating the Branch Target Morgan Kaufmann Publishers Calculating the Branch Target 14 September, 2018 Even with predictor, still need to calculate the target address 1-cycle penalty for a taken branch Branch target buffer Cache of target addresses (destination PC) or destination instruction Indexed by PC when instruction fetched If hit and instruction is branch predicted taken, can fetch target immediately Correlating predictor A branch predictor that combines local behavior of a particular branch and global information about the behavior of some recent number of executed branches. Tournament branch predictor A branch predictor with multiple predictions for each branch and a selection mechanism that chooses which predictor to enable for a given branch. Chapter 4 — The Processor

Exceptions and Interrupts Morgan Kaufmann Publishers Exceptions and Interrupts 14 September, 2018 §4.9 Exceptions Control is the most challenging aspect of processor design: it is both the hardest part to get right and the toughest part to make fast. One of the demanding tasks of control is implementing exceptions and interrupts “Unexpected” events requiring change in flow of control Different ISAs use the terms differently Exception Arises within the CPU e.g., undefined opcode, overflow, syscall, … Interrupt From an external I/O controller Detecting exception conditions and taking the appropriate action is often on the critical timing path of a processor, which determines the clock cycle time and thus performance. Dealing with them without sacrificing performance is hard Chapter 4 — The Processor

Morgan Kaufmann Publishers 14 September, 2018 Handling Exceptions Save PC of offending (or interrupted) instruction In LEGv8: Exception Link Register (ELR) Transfer control to the operating system at some specified address For the operating system to handle the exception, it must know the reason for the exception Communicate the reason for an exception through a register In LEGv8: Exception Syndrome Register (ESR) We’ll assume 1-bit 0 for undefined opcode, 1 for overflow Chapter 4 — The Processor

An Alternate Mechanism Morgan Kaufmann Publishers 14 September, 2018 An Alternate Mechanism Vectored Interrupts Handler address determined by the cause Exception vector address to be added to a vector table base register: Unknown Reason: 00 0000two Floating-point arithmetic exception: 10 1100two System Error (hardware malfunction): 11 1111two Instructions either Deal with the interrupt, or Jump to real handler Chapter 4 — The Processor

Morgan Kaufmann Publishers 14 September, 2018 Handler Actions Read cause, and transfer to relevant handler Determine action required If restartable Take corrective action use ELR to return to program Otherwise Terminate program Report error using ESR, cause, … Chapter 4 — The Processor

Exception Handling in LEGv8 Exception not vectored (as in LEGv8) A single interrupt entry point for all exceptions - 0000 0000 1C09 0000 operating system decodes the status register to find the cause Two additional registers to our current LEGv8 implementation: ELR: A 64-bit register used to hold the address of the affected instruction. ESR: A register used to record the cause of the exception. In the LEGv8 architecture, this register is 32 bits, although some bits are currently unused.

Exceptions in a Pipeline Morgan Kaufmann Publishers Exceptions in a Pipeline 14 September, 2018 Exceptions in a pipelined implementation - another form of control hazard Consider hardware malfunction on add in EX stage ADD X1, X2, X1 Flush add and subsequent instructions Prevent X1 from being clobbered as Destination register EX.Flush signal to prevent the instruction in the EX stage from writing its result in the WB stage. Many exceptions require that we complete previous instructions flush the instruction and restart it from the beginning after the exception is handled. Set ESR and ELR register values Transfer control to handler Similar to mispredicted branch Use much of the same hardware Chapter 4 — The Processor

Pipeline with Exceptions Morgan Kaufmann Publishers Pipeline with Exceptions 14 September, 2018 LEGv8 exception address 0000 0000 1C09 0000 Chapter 4 — The Processor

Morgan Kaufmann Publishers 14 September, 2018 Exception Properties Restartable exceptions Pipeline can flush the instruction Handler executes, then returns to the instruction Refetched and executed from scratch PC saved in ELR register Identifies causing instruction Actually PC + 4 is saved Handler must adjust Chapter 4 — The Processor

Morgan Kaufmann Publishers 14 September, 2018 Exception Example Exception on ADD in 40 SUB X11, X2, X4 44 AND X12, X2, X5 48 ORR X13, X2, X6 4C ADD X1, X2, X1 50 SUB X15, X6, X7 54 LDUR X16, [X7,#100] … assume the instructions to be invoked on an exception begin like this: 80000180 STUR X26, [X0,#1000] 80000184 STUR X27, [X0,#1008] … Chapter 4 — The Processor

Morgan Kaufmann Publishers Exception Example 14 September, 2018 Chapter 4 — The Processor

Morgan Kaufmann Publishers Exception Example 14 September, 2018 Chapter 4 — The Processor

Morgan Kaufmann Publishers Multiple Exceptions 14 September, 2018 Pipelining overlaps multiple instructions Could have multiple exceptions at once Simple approach: deal with exception from earliest instruction Flush subsequent instructions “Precise” exceptions - always associating the proper exception with the correct instruction Imprecise exceptions - Interrupts or exceptions in pipelined computers that are not associated with the exact instruction that was the cause of the interrupt or exception. In complex pipelines Multiple instructions issued per cycle Out-of-order completion Maintaining precise exceptions is difficult! Chapter 4 — The Processor