1 1999 ©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan/cs162.

Slides:



Advertisements
Similar presentations
Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.
Advertisements

CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Review: MIPS Pipeline Data and Control Paths
©UCB CS 161Computer Architecture Introduction to Advanced Architecturs Lecture 13 Instructor: L.N. Bhuyan Adapted from notes.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor Start X:40.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
©UCB CS 161Computer Architecture Chapter 5 Lecture 11 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
©UCB CS 162 Computer Architecture Lecture 2: Introduction & Pipelining Instructor: L.N. Bhuyan
©UCB CS 161Computer Architecture Chapter 5 Instructor: L.N. Bhuyan LECTURE 10.
Computer Architecture - A Pipelined Datapath A Pipelined Datapath  Resisters are used to save data between stages. 1/14.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
Supplementary notes for pipelining LW ____,____ SUB ____,____,____ BEQ ____,____,____ ; assume that, condition for branch is not satisfied OR ____,____,____.
Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.
55:035 Computer Architecture and Organization Lecture 10.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.
Pipelined Datapath and Control
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-2 Read Section 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
Computer Organization CS224 Chapter 4 Part b The Processor Spring 2010 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)
CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
CDA 3101 Summer 2003 Introduction to Computer Organization Pipeline Control And Pipeline Hazards 17 July 2003.
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Spr 2016, Mar 9... ELEC / Lecture 7 1 ELEC / Computer Architecture and Design Spring 2016 Pipeline Control and Performance.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Computer Organization
Stalling delays the entire pipeline
Note how everything goes left to right, except …
Review: Instruction Set Evolution
Pipelining: Hazards Ver. Jan 14, 2014
CDA 3101 Spring 2016 Introduction to Computer Organization
5 Steps of MIPS Datapath Figure A.2, Page A-8
Single Clock Datapath With Control
ECS 154B Computer Architecture II Spring 2009
ECE232: Hardware Organization and Design
CpE 442 Designing a Pipeline Processor (lect. II)
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Chapter 4 The Processor Part 2
Single-cycle datapath, slightly rearranged
A pipeline diagram Clock cycle lw $t0, 4($sp) IF ID
The Processor Lecture 3.6: Control Hazards
The Processor Lecture 3.5: Data Hazards
Instruction Execution Cycle
Introduction to Computer Organization and Architecture
Throughput = #instructions per unit time (seconds/cycles etc.)
©2003 Craig Zilles (derived from slides by Howard Huang)
Pipelined datapath and control
ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.
Presentation transcript:

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

©UCB Single Cycle Datapath (From Ch 5) Regs Read Reg1 Read data1 ALUALU Read data2 Read Reg2 Write Reg Write Data Zero ALU- con RegWrite Address Read data Write Data Sign Extend Dmem MemRead MemWrite MuxMux MemTo- Reg MuxMux Read Addr Instruc- tion Imem 4 PCPC addadd addadd << 2 MuxMux PCSrc ALUOp ALU- src MuxMux 25:21 20:16 15:11 RegDst 15:0 31:0

©UCB Required Changes to Datapath °Introduce registers to separate 5 stages by putting IF/ID, ID/EX, EX/MEM, and MEM/WB registers in the datapath. °Next PC value is computed in the 3 rd step, but we need to bring in next instn in the next cycle – Move PCSrc Mux to 1 st stage. The PC is incremented unless there is a new branch address. °Branch address is computed in 3 rd stage. With pipeline, the PC value has changed! Must carry the PC value along with instn. Width of IF/ID register = (IR)+(PC) = 64 bits.

©UCB Changes to Datapath Contd. °For lw instn, we need write register address at stage 5. But the IR is now occupied by another instn! So, we must carry the IR destination field as we move along the stages. See connection in fig. Length of ID/EX register = (Reg1:32)+(Reg2:32)+(offset:32)+ (PC:32)+ (destination register:5) = 133 bits Assignment: What are the lengths of EX/MEM, and MEM/WB registers

©UCB Pipelined Datapath (with Pipeline Regs)(6.2) Address Add Add result Shift left 2 I n s t r u c t i o n M u x 0 1 Add PC 0 Address Write data M u x 1 Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero Imem Dmem Regs IF/ID ID/EX EX/MEM MEM/WB 64 bits 133 bits 102 bits 69 bits 5 Fetch Decode Execute Memory Write Back

©UCB Pipelined Control (6.3) Start with single-cycle controller Group control lines by pipeline stage needed Extend pipeline registers with control bits Control EX Mem WB WB WB IF/IDID/EXEX/MEMMEM/WB Instruction RegDst ALUop ALUSrc Branch MemRead MemWrite MemToReg RegWrite

©UCB Pipelined Processor: Datapath + Control PC I n s t r u c t i o n Add Instruction [ 2 0 – 16] Instruction [15–0] 0 0 M u x 0 1 Add Add result Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend M u x 1 ALU result Zero Write data Read data M u x 1 ALU control Shift left 2 C ontrol ALU Instruction [15–11] 6 EX M WB M WB WB IF/ID ID/EX EX/MEM MEM/WB M u x 0 1 Address Address More work to correctly handle pipeline hazards RegWrite ALUSrc ALUOp RegDst MemRead MemToReg MemWrite Branch PCSrc Imem Dmem Regs

©UCB Recap °if can keep all pipeline stages busy, can retire (complete) up to one instruction per clock cycle (thereby achieving single-cycle throughput) °The pipeline paradox (for MIPS): any instruction still takes 5 cycles to execute (even though can retire one instruction per cycle)

©UCB Problems for Pipelining °Hazards prevent next instruction from executing during its designated clock cycle, limiting speedup Structural hazards: HW cannot support this combination of instructions (single memory for instruction and data) Data hazards: Instruction depends on result of prior instruction still in the pipeline Control hazards: conditional branches & other instructions may stall the pipeline delaying later instructions

©UCB M Single Memory is a Structural Hazard Load Instr 1 Instr 2 Instr 3 Instr 4 ALU M Reg M ALU M Reg M ALU M Reg M ALU Reg M ALU M Reg M Can’t read same memory twice in same clock cycle I n s t r. O r d e r Time (clock cycles)

©UCB EX: MIPS multicycle datapath: Structural Hazard in Memory Registers Read Reg1 ALUALU Read Reg2 Write Reg Data PCPC Address Instruction or Data Memory A B ALU- Out Instruction Register Data Memory Data Register Read data 1 Read data 2

©UCB Structural Hazards limit performance °Example: if 1.3 memory accesses per instruction (30% of instructions execute loads and stores) and only one memory access per cycle then Average CPI  1.3 Otherwise datapath resource is more than 100% utilized Structural Hazard Solution: Add more Hardware

©UCB Speed Up Equation for Pipelining CPI pipelined = Ideal CPI + Pipeline stall clock cycles per instn Speedup = Ideal CPI x Pipeline depth Clock Cycle unpipelined X Ideal CPI + Pipeline stall CPI Clock Cycle pipelined Speedup = Pipeline depth Clock Cycle unpipelined X Pipeline stall CPI Clock Cycle pipelined x

©UCB Example: Dual-port vs. Single-port °Machine A: Dual ported memory °Machine B: Single ported memory, but its pipelined implementation has a 1.05 times faster clock rate °Ideal CPI = 1 for both °Loads are 40% of instructions executed SpeedUp A = Pipeline Depth/(1 + 0) x (clock unpipe /clock pipe ) = Pipeline Depth SpeedUp B = Pipeline Depth/( x 1) x (clock unpipe /(clock unpipe / 1.05) = (Pipeline Depth/1.4) x 1.05 = 0.75 x Pipeline Depth SpeedUp A / SpeedUp B = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33 °Machine A is 1.33 times faster

©UCB Data Hazard on Register $1 (6.4) add $1,$2, $3 sub $4, $1,$3 and $6, $1,$7 or $8, $1,$9 xor $10, $1,$11

©UCB “Forward” result from one stage to another “or” OK if implement register file properly Data Hazard Solution: add $1,$2,$3 sub $4,$1,$3 and $6,$1,$7 or $8,$1,$9 xor $10,$1,$11 IFID/RFEXMEMWB ALU IM Reg DM Reg ALU IM Reg DMReg ALU IM Reg DMReg IM ALU Reg DMReg ALU IM Reg DMReg I n s t r. O r d e r Time (clock cycles)

©UCB Hazard Detection for Forwarding °A hazard must be detected just before execution so that in case of hazard, the data can be forwarded to the input of the ALU. °It can be detected when a source register (Rs or Rt or both) of the instruction at the EX stage is equal to the destination register (Rd) of an instruction in the pipeline (either in MEM or WB stage) °Compare the values of Rs and Rt registers in the ID/EX stage with Rd at EX/MEM and MEM/WB stages => Need to carry Rs, Rt, Rd values to the ID/EX register from the IF/ID register (only Rd was carried before) °If they match, forward the data to the input of the ALU through the multiplexor. See Fig pp. 488 of the text

©UCB Dependencies backward in time are hazards Can’t solve with forwarding alone Must stall instruction dependent on load “Load-Use” hazard Forwarding: What about Loads? lw $1,0($2) sub $4,$1,$3 IFID/RFEXMEMWB ALU IM Reg DM Reg ALU IM Reg DMReg

©UCB Must stall pipeline 1 cycle (insert 1 bubble) lw $1, 0($2) sub $4,$1,$6 and $6,$1,$7 or $8,$1,$9 IFID/RFEXMEMWB ALU IM Reg DM Reg ALU IM Reg DMReg ALU IM Reg DMReg IM ALU Reg DM Time (clock cycles) bub ble Data Hazard Even with Forwarding

©UCB Compiler Schemes to Improve Load Delay °Compiler will detect data dependency and inserts nop instructions until data is available sub $2, $1, $3 nop and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) °Compiler will find independent instructions to fill in the delay slots

©UCB Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SWd,Rd Software Scheduling to Avoid Load Hazards Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SWd,Rd