CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II 2005-07-21.

Slides:



Advertisements
Similar presentations
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
Advertisements

CS152 Computer Architecture and Engineering Lecture 12 Introduction to Pipelining: Datapath and Control March 8 th, 2004 John Kubiatowicz (
1 IKI20210 Pengantar Organisasi Komputer Kuliah no. 25: Pipeline 10 Januari 2003 Bobby Nazief Johny Moningka
Pipelining II (1) Fall 2005 Lecture 19: Pipelining II.
CS61C L29 CPU Design : Pipelining to Improve Performance II (1) Garcia, Fall 2006 © UCB Running away with it  Our #8 football team had a great effort.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
ECE 232 L22.Pipeline3.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 22 Pipelining,
CS Computer Architecture 1 CS 430 – Computer Architecture Pipelined Execution - Review William J. Taffe using slides of David Patterson.
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
CS61C L26 Single Cycle CPU Datapath II (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CS61C L30 CPU Design : Pipelining to Improve Performance II (1) Garcia, Spring 2007 © UCB E-voting bill in congress!  Rep Rush Holt (D-NJ) has a bill.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor Start X:40.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.
ECE 232 L19.Pipeline2.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 19 Pipelining,
CS 61C L17 Control (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #17: CPU Design II – Control
Pipelining Datapath Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley) and Hank Walker (TAMU)
CS61C L31 Pipelined Execution, part II (1) Garcia, Fall 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.
Scott Beamer, Instructor
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Ceg3420 L1 4.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining.
CS 61C L16 Datapath (1) A Carle, Summer 2004 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #16 – Datapath Andy.
CS 61C L18 Pipelining I (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #18: Pipelining
Ceg3420 L13.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
CS 61C L38 Pipelined Execution, part II (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
Lecture 12: Pipeline Datapath Design Professor Mike Schulte Computer Architecture ECE 201.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Pipelined Datapath and Control
CMPE 421 Parallel Computer Architecture
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
CS 61C L5.2.2 Pipelining I (1) K. Meinz, Summer 2004 © UCB CS61C : Machine Structures Lecture Pipelining I Kurt Meinz inst.eecs.berkeley.edu/~cs61c.
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
CDA 3101 Summer 2003 Introduction to Computer Organization Pipeline Control And Pipeline Hazards 17 July 2003.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Problem with Single Cycle Processor Design
Lecture 18: Pipelining I.
CDA 3101 Spring 2016 Introduction to Computer Organization
5 Steps of MIPS Datapath Figure A.2, Page A-8
ECS 154B Computer Architecture II Spring 2009
\course\cpeg323-08F\Topic6b-323
CpE 442 Designing a Pipeline Processor (lect. II)
Review: MIPS Pipeline Data and Control Paths
Pipelining review.
Pipelining Chapter 6.
CS152 – Computer Architecture and Engineering Lecture 11 –
Pipelining in more detail
\course\cpeg323-05F\Topic6b-323
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
The Processor Lecture 3.5: Data Hazards
Instruction Execution Cycle
COMS 361 Computer Organization
Pipeline Control unit (highly abstracted)
Introduction to Computer Organization and Architecture
CMCS Computer Architecture Lecture 20 Pipelined Datapath and Control April 11, CMSC411.htm Mohamed.
Guest Lecturer: Justin Hsia
Recall: Performance Evaluation
What You Will Learn In Next Few Sets of Lectures
Presentation transcript:

CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II Andy Carle

CS 61C L19 Pipelining II (2) A Carle, Summer 2005 © UCB Review: Datapath for MIPS Use datapath figure to represent pipeline IFtchDcdExecMemWB ALU I$ Reg D$Reg PC instruction memory +4 rt rs rd registers ALU Data memory imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute4. Memory 5. Write Back

CS 61C L19 Pipelining II (3) A Carle, Summer 2005 © UCB Review: Problems for Computers Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) Control hazards: Pipelining of branches & other instructions stall the pipeline until the hazard; “bubbles” in the pipeline Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)

CS 61C L19 Pipelining II (4) A Carle, Summer 2005 © UCB Review: C.f. Branch Delay vs. Load Delay Load Delay occurs only if necessary (dependent instructions). Branch Delay always happens (part of the ISA). Why not have Branch Delay interlocked? Answer: Interlocks only work if you can detect hazard ahead of time. By the time we detect a branch, we already need its value … hence no interlock is possible!

CS 61C L19 Pipelining II (5) A Carle, Summer 2005 © UCB FYI: Historical Trivia First MIPS design did not interlock and stall on load-use data hazard Real reason for name behind MIPS: Microprocessor without Interlocked Pipeline Stages Word Play on acronym for Millions of Instructions Per Second, also called MIPS Load/Use  Wrong Answer!

CS 61C L19 Pipelining II (6) A Carle, Summer 2005 © UCB Outline Pipeline Control Forwarding Control Hazard Control

CS 61C L19 Pipelining II (7) A Carle, Summer 2005 © UCB Piped Proc So Far …

CS 61C L19 Pipelining II (8) A Carle, Summer 2005 © UCB New Representation: Regs more explicit Exec Reg. File Data Mem ABS Reg File PC Next PC IR Inst. Mem DMS IF/DE DE/EX EX/ME ME/WB IF/DE.Ir = Instruction DE/EX.A = BusA out of Reg EX/ME.S = AluOut EX/ME.D = Bus B pass-through for sw ME/WB.S = ALuOut pass-through ME/WB.M = Mem Result from lw

CS 61C L19 Pipelining II (9) A Carle, Summer 2005 © UCB New Representation: Regs more explicit Exec Reg. File Data Mem ABS Reg File PC Next PC IR Inst. Mem DMS IF/DE DE/EX EX/ME ME/WB  What’s Missing???

CS 61C L19 Pipelining II (10) A Carle, Summer 2005 © UCB Pipelined Processor (almost) for slides Exec Reg. File Mem Access Data Mem ABSM Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D Idea: Parallel Piped Control …

CS 61C L19 Pipelining II (11) A Carle, Summer 2005 © UCB Pipelined Control IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Access Data Mem ABS Reg File Equal PC Next PC IR Inst. Mem DM

CS 61C L19 Pipelining II (12) A Carle, Summer 2005 © UCB Data Stationary Control The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/DecExecMem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemWr Wr

CS 61C L19 Pipelining II (13) A Carle, Summer 2005 © UCB Let’s Try it Out 10lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 32addr10, r11, r12 100andr13, r14, 15

CS 61C L19 Pipelining II (14) A Carle, Summer 2005 © UCB Start: Fetch 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt im 10lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 IF PC Next PC 10 = nnnn

CS 61C L19 Pipelining II (15) A Carle, Summer 2005 © UCB Fetch 14, Decode 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt im 10lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1, 36(r2) ID IF PC Next PC 14 = nnn

CS 61C L19 Pipelining II (16) A Carle, Summer 2005 © UCB Fetch 20, Decode 14, Exec 10 Exec Reg. File Mem Acces s Data Mem r2 BS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt 36 10lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 addI r2, r2, 3 ID IF EX PC Next PC 20 = n n

CS 61C L19 Pipelining II (17) A Carle, Summer 2005 © UCB Fetch 24, Decode 20, Exec 14, Mem 10 Exec Reg. File Mem Acces s Data Mem r2 B r2+36 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 sub r3, r4, r5 addI r2, r2, 3 ID IF EX M PC Next PC 24 = n

CS 61C L19 Pipelining II (18) A Carle, Summer 2005 © UCB Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 Exec Reg. File Mem Acces s Data Mem r4 r5 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M[r2+36] 67 10lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 beq r6, r7 100 addI r2 sub r3 ID IF EX M WB PC Next PC 30 = Note Delayed Branch: always execute ori after beq

CS 61C L19 Pipelining II (19) A Carle, Summer 2005 © UCB Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 Exec Reg. File Mem Acces s Data Mem r6 r7 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 9xx 10lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 beq addI r2 sub r3 r4-r5 100 ori r8, r9 17 ID IF EX M WB PC Next PC 100 =

CS 61C L19 Pipelining II (20) A Carle, Summer 2005 © UCB ? ? Exec Reg. File Mem Acces s Data Mem ABS M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D Remember: means triggered on edge. What is wrong here?

CS 61C L19 Pipelining II (21) A Carle, Summer 2005 © UCB Double-Clocked Signals Exec Reg. File Mem Acces s Data Mem A B S M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D Some signals are double clocked! In general: Inputs to edge components are their own pipeline regs Watch out for stalls and such!

CS 61C L19 Pipelining II (22) A Carle, Summer 2005 © UCB Administrivia Proj 2 – Due Sunday HW6 – Due Tuesday Midterm 2: Friday, July 29: 11:00 – 2:00 Location TBD If you are really so concerned about the drop deadline that this is a problem for you, talk to me about the possibility of taking the exam on Thursday

CS 61C L19 Pipelining II (23) A Carle, Summer 2005 © UCB Megahertz Myth?

CS 61C L19 Pipelining II (24) A Carle, Summer 2005 © UCB Outline Pipeline Control Forwarding Control Hazard Control

CS 61C L19 Pipelining II (25) A Carle, Summer 2005 © UCB Fix by Forwarding result as soon as we h ave it to where we need it: Review: Forwarding sub $t4,$t0,$t3 ALU I$ Reg D$Reg and $t5,$t0,$t6 ALU I$ Reg D$Reg or $t7,$t0,$t8 * I$ ALU Reg D$Reg xor $t9,$t0,$t10 ALU I$ Reg D$Reg add $t0,$t1,$t2 IFID/RFEXMEMWB ALU I$ Reg D$ Reg * “ or ” hazard solved by register hardware

CS 61C L19 Pipelining II (26) A Carle, Summer 2005 © UCB Forwarding In general: For each stage i that has reg inputs For each stage j after I that has reg output -If i.reg == j.reg  forward j value back to i. -Some exceptions ($0, invalid) ALUinput  (ALUResult, MemResult) MemInput  (MemResult) In particular:

CS 61C L19 Pipelining II (27) A Carle, Summer 2005 © UCB Pending Writes In Pipeline Registers npc I mem Regs B alu S D mem m IAU PC Regs A imn n n op rw rs rt

CS 61C L19 Pipelining II (28) A Carle, Summer 2005 © UCB Pending Writes In Pipeline Registers Current operand registers Pending writes hazard <= ((rs == rw ex) & regW ex ) OR ((rs == rw mem) & regW me ) OR ((rs == rw wb) & regW wb ) OR ((rt == rw ex) & regW ex ) OR ((rt == rw mem) & regW me ) OR ((rt == rw wb ) & regW wb ) npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt

CS 61C L19 Pipelining II (29) A Carle, Summer 2005 © UCB Forwarding Muxes Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Data Forwarding = Data Bypassing npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt Forward mux

CS 61C L19 Pipelining II (30) A Carle, Summer 2005 © UCB What about memory operations? AB op Rd Ra Rb Rd to reg file R Rd Tricky situation: MIPS: lw 0($t0) sw 0($t1) RTL: R1 <- Mem[ R2 + I ]; Mem[R3+34] <- R1 D Mem T

CS 61C L19 Pipelining II (31) A Carle, Summer 2005 © UCB What about memory operations? AB op Rd Ra Rb Rd to reg file R Rd Tricky situation: MIPS: lw 0($t0) sw 0($t1) RTL: R1 <- Mem[ R2 + I ]; Mem[R3+34] <- R1 Solution: Handle with bypass in memory stage! D Mem T

CS 61C L19 Pipelining II (32) A Carle, Summer 2005 © UCB Outline Pipeline Control Forwarding Control Hazard Control

CS 61C L19 Pipelining II (33) A Carle, Summer 2005 © UCB Forwarding works if value is available (but not written back) before it is needed. But consider … Data Hazard: Loads (1/4) sub $t3,$t0,$t2 ALU I$ Reg D$Reg lw $t0,0($t1) IFID/RFEXMEMWB ALU I$ Reg D$ Reg Need result before it is calculated! Must stall use (sub) 1 cycle and then forward. …

CS 61C L19 Pipelining II (34) A Carle, Summer 2005 © UCB Hardware must stall pipeline Called “interlock” Data Hazard: Loads (2/4) sub $t3,$t0,$t2 ALU I$ Reg D$Reg bub ble and $t5,$t0,$t4 ALU I$ Reg D$Reg bub ble or $t7,$t0,$t6 I$ ALU Reg D$ bub ble lw $t0, 0($t1) IFID/RFEXMEMWB ALU I$ Reg D$ Reg

CS 61C L19 Pipelining II (35) A Carle, Summer 2005 © UCB Data Hazard: Loads (3/4) Instruction slot after a load is called “load delay slot” If that instruction uses the result of the load, then the hardware interlock will stall it for one cycle. If the compiler puts an unrelated instruction in that slot, then no stall Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot (except the latter uses more code space)

CS 61C L19 Pipelining II (36) A Carle, Summer 2005 © UCB Data Hazard: Loads (4/4) Stall is equivalent to nop sub $t3,$t0,$t2 and $t5,$t0,$t4 or $t7,$t0,$t6 I$ ALU Reg D$ lw $t0, 0($t1) ALU I$ Reg D$ Reg bub ble ALU I$ Reg D$ Reg ALU I$ Reg D$ Reg nop

CS 61C L19 Pipelining II (37) A Carle, Summer 2005 © UCB Hazards / Stalling In general: For each stage i that has reg inputs If I’s reg is being written later on in the pipe but is not ready yet -Stages 0 to i: Stall (Turn CEs off so no change) -Stage i+1: Make a bubble (do nothing) -Stages i+2 onward: As usual ALUinput  (MemResult) In particular:

CS 61C L19 Pipelining II (38) A Carle, Summer 2005 © UCB Hazards / Stalling Alternative Approach: Detect non-forwarding hazards in decode Possible since our hazards are formal. -Not always the case. Stalling then becomes: -Issue nop to EX stage -Turn off nextPC update (refetch same inst) -Turn off InstReg update (re-decode same inst)

CS 61C L19 Pipelining II (39) A Carle, Summer 2005 © UCB Stall Logic 1. Detect non- resolving hazards. 2a. Insert Bubble 2b. Stall nextPC, IF/DE npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt Forward mux

CS 61C L19 Pipelining II (40) A Carle, Summer 2005 © UCB Stall Logic Stall-on-issue is used quite a bit More complex processors: many cases that stall on issue. More complex processors: cases that can’t be detected at decode -E.g. value needed from mem is not in cache – proc must stall multiple cycles

CS 61C L19 Pipelining II (41) A Carle, Summer 2005 © UCB By the way … Notice that our forwarding and stall logic is stateless! Big Idea: Keep it simple! Option 1: Store old fetched inst in reg (“stall_temp”), keep state reg that says whether to use stall_temp or value coming off inst mem. Option 2: Re-fetch old value by turning off PC update.