CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.

Slides:

Advertisements

Similar presentations

CS152 Computer Architecture and Engineering Lecture 12 Introduction to Pipelining: Datapath and Control March 8 th, 2004 John Kubiatowicz (

Advertisements

1 IKI20210 Pengantar Organisasi Komputer Kuliah no. 25: Pipeline 10 Januari 2003 Bobby Nazief Johny Moningka

Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.

Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.

Pipeline Computer Architecture Lecture 12: Designing a Pipeline Processor.

Computer Architecture

CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.

ECE 232 L22.Pipeline3.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 22 Pipelining,

CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II

Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,

EE30332 Ch6 DP.1 Ch 6: Pipelining Modified from Dave Patterson’s notes  Laundry Example  Ann, Brian, Cathy, Dave each have one load of clothes to wash,

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor Start X:40.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.

Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

ECE 232 L19.Pipeline2.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 19 Pipelining,

Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.

Pipelining Datapath Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley) and Hank Walker (TAMU)

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.

1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.

Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.

Ceg3420 L1 4.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining.

Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.

Ceg3420 L13.1 DAP Fa97,  U.CB CEG3420 Computer Design Introduction to Pipelining.

CS152 / Kubiatowicz Lec /17/01©UCB Fall 2001 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.

Lecture 12: Pipeline Datapath Design Professor Mike Schulte Computer Architecture ECE 201.

B 0000 Pipelining ENGR xD52 Eric VanWyk Fall

EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

Computer Organization CS224 Chapter 4 Part b The Processor Spring 2010 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.

CMPE 421 Parallel Computer Architecture

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.

CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

EECS 322 March 27, 2000 Based on Dave Patterson slides Instructor: Francis G. Wolff Case Western Reserve University This presentation.

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.

Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

Pipelining CS365 Lecture 9. D. Barbara Pipeline CS465 2 Outline  Today’s topic  Pipelining is an implementation technique in which multiple instructions.

CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.

CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.

Lecture 18: Pipelining I.

Computer Organization

CMSC 611: Advanced Computer Architecture

ECE232: Hardware Organization and Design

Pipelining Lessons 6 PM T a s k O r d e B C D A 30

Dave Patterson (http.cs.berkeley.edu/~patterson)

Chapter 4 The Processor Part 3

Chapter 4 The Processor Part 2

John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

Pipelining Lessons 6 PM T a s k O r d e B C D A 30

An Introduction to pipelining

John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

Pipelining Appendix A and Chapter 3.

CMCS Computer Architecture Lecture 20 Pipelined Datapath and Control April 11, CMSC411.htm Mohamed.

Recall: Performance Evaluation

Presentation transcript:

CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control March 3 rd, 2004 John Kubiatowicz ( lecture slides:

CS152 / Kubiatowicz Lec11.2 3/3/04©UCB Spring 2004 Recap: Exceptions °Exception = unprogrammed control transfer system takes action to handle the exception -must record the address of the offending instruction -record any other information necessary to return afterwards returns control to user must save & restore user state °Allows constuction of a “user virtual machine” normal control flow: sequential, jumps, branches, calls, returns user program System Exception Handler Exception: return from exception

CS152 / Kubiatowicz Lec11.3 3/3/04©UCB Spring 2004 Recap: Precise Exceptions °Precise  state of the machine is preserved as if program executed up to the offending instruction All previous instructions completed Offending instruction and all following instructions act as if they have not even started Same system code will work on different implementations Position clearly established by IBM Difficult in the presence of pipelining, out-ot-order execution,... MIPS takes this position °Imprecise  system software has to figure out what is where and put it all back together °Performance goals often lead designers to forsake precise interrupts system software developers, user, markets etc. usually wish they had not done this °Modern techniques for out-of-order execution and branch prediction help implement precise interrupts

CS152 / Kubiatowicz Lec11.4 3/3/04©UCB Spring 2004 Recap: Sequential Laundry °Sequential laundry takes 6 hours for 4 loads °If they learned pipelining, how long would laundry take? ABCD PM Midnight TaskOrderTaskOrder Time

CS152 / Kubiatowicz Lec11.5 3/3/04©UCB Spring 2004 Recap: Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Pipeline rate limited by slowest pipeline stage °Multiple tasks operating simultaneously using different resources °Potential speedup = Number pipe stages °Unbalanced lengths of pipe stages reduces speedup °Time to “fill” pipeline and time to “drain” it reduces speedup °Stall for Dependences ABCD 6 PM 789 TaskOrderTaskOrder Time

CS152 / Kubiatowicz Lec11.6 3/3/04©UCB Spring 2004 Recap: Ideal Pipelining IFDCDEXMEMWB IFDCDEXMEMWB IFDCDEXMEMWB IFDCDEXMEMWB IFDCDEXMEMWB Maximum Speedup  Number of stages Speedup  Time for unpipelined operation Time for longest stage Example: 40ns data path, 5 stages, Longest stage is 10 ns, Speedup  4 Assume instructions are completely independent!

CS152 / Kubiatowicz Lec11.7 3/3/04©UCB Spring 2004 °The Five Classic Components of a Computer °Today’s Topics: Recap last lecture/finish datapath Pipelined Control/ Do it yourself Pipelined Control Administrivia Hazards/Forwarding Exceptions Review MIPS R3000 pipeline The Big Picture: Where are We Now? Control Datapath Memory Processor Input Output

CS152 / Kubiatowicz Lec11.8 3/3/04©UCB Spring 2004 Can pipelining get us into trouble? °Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time -E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready -E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer -instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaulated -E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in -branch instructions °Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards

CS152 / Kubiatowicz Lec11.9 3/3/04©UCB Spring 2004 Im Single Memory is a Structural Hazard I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Im Reg MemReg ALU Im Reg MemReg ALU Im Reg MemReg ALU Reg MemReg ALU Im Reg MemReg Detection is easy in this case! Fix: Stall Instr 3 fetch.

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Structural Hazards limit performance °Example: if 1.3 memory accesses per instruction and only one memory access per cycle then average CPI  1.3 otherwise resource is more than 100% utilized

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 °Stall: wait until decision is clear °Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) => slow °Move decision to end of decode save 1 cycle per branch Control Hazard Solution #1: Stall I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Im Reg MemReg ALU Im Reg MemReg ALU Reg MemReg Im Lost potential

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Reg °Predict: guess one direction then back up if wrong °Impact: 0 lost cycles per branch instruction if right, 1 if wrong (right 50% of time) Need to “Squash” and restart following instruction if wrong Produce CPI on branch of (1 * *.5) = 1.5 Total CPI might then be: 1.5 * *.8 = 1.1 (20% branch) °More dynamic scheme: history of 1 branch ( 90%) Control Hazard Solution #2: Predict I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Im Reg MemReg ALU Im MemReg Im ALU Reg MemReg

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Reg °Delayed Branch: Redefine branch behavior (takes place after next instruction) °Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” ( 50% of time) °As launch more instruction per clock cycle, less useful Control Hazard Solution #3: Delayed Branch I n s t r. O r d e r Time (clock cycles) Add Beq Misc ALU Im Reg MemReg ALU Im MemReg Im ALU Reg MemReg Load Im ALU Reg MemReg

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Data Hazard on r1: Read after write hazard (RAW) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Reg Dependencies backwards in time are hazards I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im DmReg Data Hazard on r1: Read after write hazard (RAW)

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 “Forward” result from one stage to another “or” OK if define read/write properly Data Hazard Solution: Forwarding I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Reg Dependencies backwards in time are hazards Can’t solve with forwarding: Must delay/stall instruction dependent on loads Forwarding (or Bypassing): What about Loads? Time (clock cycles) lw r1,0(r2) sub r4,r1,r3 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm ALU Im Reg DmReg

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Reg Dependencies backwards in time are hazards Can’t solve with forwarding: Must delay/stall instruction dependent on loads Forwarding (or Bypassing): What about Loads Time (clock cycles) lw r1,0(r2) sub r4,r1,r3 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm ALU Im Reg DmReg Stall

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Recap: Data Hazards I-Fet ch DCD MemOpFetch OpFetch Exec Store IFetch DCD ° ° ° Structural Hazard I-Fet ch DCD OpFetch Jump IFetch DCD ° ° ° Control Hazard IF DCD EX Mem WB IF DCD OF Ex Mem RAW (read after write) Data Hazard WAW Data Hazard (write after write) IF DCD OF Ex RSWAR Data Hazard (write after read) IF DCD EX Mem WB

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Designing a Pipelined Processor °Go back and examine your datapath and control diagram °associated resources with states °ensure that flows do not conflict, or figure out how to resolve conflicts °assert control in appropriate stage

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Control and Datapath: Split state diag into 5 pieces IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Acces s Data Mem ABS Reg File Equal PC Next PC IR Inst. Mem DM

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Pipelined Processor (almost) for slides °What happens if we start a new instruction every cycle? Exec Reg. File Mem Acces s Data Mem A B S M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Pipelined Datapath (as in book); hard to read

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Administrivia °Midterm I on Wednesday 3/10 5:30 - 8:30 in 306 Soda Hall Bring a Calculator! One 8 1/2 by 11 page (both sides) of notes Make up exam: Tuesday 5:30 – 8:30 in 606 Soda Hall °Meet at LaVal’s afterwards for Pizza North-side Lavals! I’ll Buy! °Materials through Chapter 5, some of Chapter 6, Appendix A, B & C Should understand single-cycle processor multicycle processor, including exceptions °Get moving on Lab 3! This is a hard lab, since there are so many new things °Homework 4 out later today

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Pipelining the Load Instruction °The five independent functional units in the pipeline datapath are: Instruction Memory for the Ifetch stage Register File’s Read ports (bus A and busB) for the Reg/Dec stage ALU for the Exec stage Data Memory for the Mem stage Register File’s Write port (bus W) for the Wr stage Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7 IfetchReg/DecExecMemWr1st lw IfetchReg/DecExecMemWr2nd lw IfetchReg/DecExecMemWr3rd lw

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 The Four Stages of R-type °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: ALU operates on the two register operands Update PC °Wr: Write the ALU output back to the register file Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecWrR-type

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Pipelining the R-type and Load Instruction °We have pipeline conflict or structural hazard: Two instructions try to write to the register file at the same time! Only one write port Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type Ops! We have a problem!

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Important Observation °Each functional unit can only be used once per instruction °Each functional unit must be used at the same stage for all instructions: Load uses Register File’s Write Port during its 5th stage R-type uses Register File’s Write Port during its 4th stage IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type 1234 °2 ways to solve this pipeline hazard.

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Solution 1: Insert “Bubble” into the Pipeline °Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle The control logic can be complex. Lose instruction fetch and issue opportunity. °No instruction is started in Cycle 6! Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExec IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWr R-type IfetchReg/DecExecWr R-type Pipeline Bubble IfetchReg/DecExecWr

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Solution 2: Delay R-type’s Write by One Cycle °Delay R-type’s register write by one cycle: Now R-type instructions also use Reg File’s write port at Stage 5 Mem stage is a NOOP stage: nothing is being done. Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/Dec Exec WrR-type Mem Exec

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Modified Control & Datapath IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– M; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– M; S <– A + SX; Mem[S] <- B if Cond PC < PC+SX; M <– S Exec Reg. File Mem Acces s Data Mem AB S Reg File Equal PC Next PC IR Inst. Mem DM M <– S

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 The Four Stages of Store °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: Calculate the memory address °Mem: Write the data into the Data Memory Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemStoreWr

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 The Three Stages of Beq °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: compares the two register operand, select correct branch target address latch into PC Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemBeqWr

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Control Diagram IR <- Mem[PC]; PC < PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Acces s Data Mem AB S Reg File Equal PC Next PC IR Inst. Mem D M <– S M

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Recall: Single cycle control! Data Out Clk 5 RwRaRb bit Registers Rd ALU Clk Data In Data Address Ideal Data Memory Instruction Address Ideal Instruction Memory Clk PC 5 Rs 5 Rt 32 A B Next Address Control Datapath Control Signals Conditions

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Data Stationary Control °The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/DecExecMem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemWr Wr

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Datapath + Data Stationary Control Exec Reg. File Mem Acces s Data Mem ABS Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt op rs rt fun im ex me wb rw v me wb rw v wb rw v

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Let’s Try it Out 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 these addresses are octal

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Start: Fetch 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 IF PC Next PC 10 = nnnn

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Fetch 14, Decode 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1, r2(35) ID IF PC Next PC 14 = nnn

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Fetch 20, Decode 14, Exec 10 Exec Reg. File Mem Acces s Data Mem r2 BS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt 35 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 addI r2, r2, 3 ID IF EX PC Next PC 20 = n n

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Fetch 24, Decode 20, Exec 14, Mem 10 Exec Reg. File Mem Acces s Data Mem r2 B r2+35 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 sub r3, r4, r5 addI r2, r2, 3 ID IF EX M PC Next PC 24 = n

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 Exec Reg. File Mem Acces s Data Mem r4 r5 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M[r2+35] 67 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 beq r6, r7 100 addI r2 sub r3 ID IF EX M WB PC Next PC 30 = Note Delayed Branch: always execute ori after beq

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 Exec Reg. File Mem Acces s Data Mem r6 r7 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 9xx 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 beq addI r2 sub r3 r4-r5 100 ori r8, r9 17 ID IF EX M WB PC Next PC 100 =

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Fetch 104, Dcd 100, Ex 30, Mem 24, WB 20 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 ID EX M WB PC Next PC ___ = Fill it in yourself! ?

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Fetch 110, Dcd 104, Ex 100, Mem 30, WB 24 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 EX M WB PC Next PC ___ = Fill it in yourself! ?? ? ? ?

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Fetch 114, Dcd 110, Ex 104, Mem 100, WB 30 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 M WB PC Next PC ___ = Fill it in yourself! ?? ? ? ? ?

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Pipelined Processor °Separate control at each stage °Stalls propagate backwards to freeze previous stages °Bubbles in pipeline introduced by placing “Noops” into local stage, stall previous stages. Exec Reg. File Mem Acces s Data Mem A B S M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D Stalls Bubbles

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Recap: Data Hazards °Avoid some “by design” eliminate WAR by always fetching operands early (DCD) in pipe eleminate WAW by doing all WBs in order (last stage, static) °Detect and resolve remaining ones stall or forward (if possible) IF DCD EX Mem WB IF DCD OF Ex Mem RAW Data Hazard WAW Data Hazard IF DCD OF Ex RSWAR Data Hazard IF DCD EX Mem WB

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Is CPI = 1 for our pipeline? °Remember that CPI is an “Average # cycles/inst °CPI here is 1, since the average throughput is 1 instruction every cycle. °What if there are stalls or multi-cycle execution? °Usually CPI > 1. How close can we get to 1?? IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Summary °What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores °Hazards limit performance Structural: need more HW resources Data: need forwarding, compiler scheduling Control: early evaluation & PC, delayed branch, prediction °Data hazards must be handled carefully: RAW data hazards handled by forwarding WAW and WAR hazards don’t exist in 5-stage pipeline °MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load)

CS152 / Kubiatowicz Lec /3/04©UCB Spring 2004 Summary: Where this class is going °We’ll build a simple pipeline and look at these issues Lab 4  Pipelined Processor Lab 5  With caches °We’ll talk about modern processors and what’s really hard: Branches (control hazards) are really hard! Exception handling Trying to improve performance with out-of-order execution, etc. Trying to get CPI < 1 (Superscalar execution)