Download presentation
Presentation is loading. Please wait.
1
CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control March 3 rd, 2004 John Kubiatowicz (www.cs.berkeley.edu/~kubitron) lecture slides: http://inst.eecs.berkeley.edu/~cs152/
2
CS152 / Kubiatowicz Lec11.2 3/3/04©UCB Spring 2004 Recap: Exceptions °Exception = unprogrammed control transfer system takes action to handle the exception -must record the address of the offending instruction -record any other information necessary to return afterwards returns control to user must save & restore user state °Allows constuction of a “user virtual machine” normal control flow: sequential, jumps, branches, calls, returns user program System Exception Handler Exception: return from exception
3
CS152 / Kubiatowicz Lec11.3 3/3/04©UCB Spring 2004 Recap: Precise Exceptions °Precise state of the machine is preserved as if program executed up to the offending instruction All previous instructions completed Offending instruction and all following instructions act as if they have not even started Same system code will work on different implementations Position clearly established by IBM Difficult in the presence of pipelining, out-ot-order execution,... MIPS takes this position °Imprecise system software has to figure out what is where and put it all back together °Performance goals often lead designers to forsake precise interrupts system software developers, user, markets etc. usually wish they had not done this °Modern techniques for out-of-order execution and branch prediction help implement precise interrupts
4
CS152 / Kubiatowicz Lec11.4 3/3/04©UCB Spring 2004 Recap: Sequential Laundry °Sequential laundry takes 6 hours for 4 loads °If they learned pipelining, how long would laundry take? ABCD 304020304020304020304020 6 PM 789 10 11 Midnight TaskOrderTaskOrder Time
5
CS152 / Kubiatowicz Lec11.5 3/3/04©UCB Spring 2004 Recap: Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Pipeline rate limited by slowest pipeline stage °Multiple tasks operating simultaneously using different resources °Potential speedup = Number pipe stages °Unbalanced lengths of pipe stages reduces speedup °Time to “fill” pipeline and time to “drain” it reduces speedup °Stall for Dependences ABCD 6 PM 789 TaskOrderTaskOrder Time 3040 20
6
CS152 / Kubiatowicz Lec11.6 3/3/04©UCB Spring 2004 Recap: Ideal Pipelining IFDCDEXMEMWB IFDCDEXMEMWB IFDCDEXMEMWB IFDCDEXMEMWB IFDCDEXMEMWB Maximum Speedup Number of stages Speedup Time for unpipelined operation Time for longest stage Example: 40ns data path, 5 stages, Longest stage is 10 ns, Speedup 4 Assume instructions are completely independent!
7
CS152 / Kubiatowicz Lec11.7 3/3/04©UCB Spring 2004 °The Five Classic Components of a Computer °Today’s Topics: Recap last lecture/finish datapath Pipelined Control/ Do it yourself Pipelined Control Administrivia Hazards/Forwarding Exceptions Review MIPS R3000 pipeline The Big Picture: Where are We Now? Control Datapath Memory Processor Input Output
8
CS152 / Kubiatowicz Lec11.8 3/3/04©UCB Spring 2004 Can pipelining get us into trouble? °Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time -E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready -E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer -instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaulated -E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in -branch instructions °Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards
9
CS152 / Kubiatowicz Lec11.9 3/3/04©UCB Spring 2004 Im Single Memory is a Structural Hazard I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Im Reg MemReg ALU Im Reg MemReg ALU Im Reg MemReg ALU Reg MemReg ALU Im Reg MemReg Detection is easy in this case! Fix: Stall Instr 3 fetch.
10
CS152 / Kubiatowicz Lec11.10 3/3/04©UCB Spring 2004 Structural Hazards limit performance °Example: if 1.3 memory accesses per instruction and only one memory access per cycle then average CPI 1.3 otherwise resource is more than 100% utilized
11
CS152 / Kubiatowicz Lec11.11 3/3/04©UCB Spring 2004 °Stall: wait until decision is clear °Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) => slow °Move decision to end of decode save 1 cycle per branch Control Hazard Solution #1: Stall I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Im Reg MemReg ALU Im Reg MemReg ALU Reg MemReg Im Lost potential
12
CS152 / Kubiatowicz Lec11.12 3/3/04©UCB Spring 2004 Reg °Predict: guess one direction then back up if wrong °Impact: 0 lost cycles per branch instruction if right, 1 if wrong (right 50% of time) Need to “Squash” and restart following instruction if wrong Produce CPI on branch of (1 *.5 + 2 *.5) = 1.5 Total CPI might then be: 1.5 *.2 + 1 *.8 = 1.1 (20% branch) °More dynamic scheme: history of 1 branch ( 90%) Control Hazard Solution #2: Predict I n s t r. O r d e r Time (clock cycles) Add Beq Load ALU Im Reg MemReg ALU Im MemReg Im ALU Reg MemReg
13
CS152 / Kubiatowicz Lec11.13 3/3/04©UCB Spring 2004 Reg °Delayed Branch: Redefine branch behavior (takes place after next instruction) °Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” ( 50% of time) °As launch more instruction per clock cycle, less useful Control Hazard Solution #3: Delayed Branch I n s t r. O r d e r Time (clock cycles) Add Beq Misc ALU Im Reg MemReg ALU Im MemReg Im ALU Reg MemReg Load Im ALU Reg MemReg
14
CS152 / Kubiatowicz Lec11.14 3/3/04©UCB Spring 2004 Data Hazard on r1: Read after write hazard (RAW) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11
15
CS152 / Kubiatowicz Lec11.15 3/3/04©UCB Spring 2004 Reg Dependencies backwards in time are hazards I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im DmReg Data Hazard on r1: Read after write hazard (RAW)
16
CS152 / Kubiatowicz Lec11.16 3/3/04©UCB Spring 2004 “Forward” result from one stage to another “or” OK if define read/write properly Data Hazard Solution: Forwarding I n s t r. O r d e r Time (clock cycles) add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm Reg ALU Im Reg DmReg ALU Im Reg DmReg Im ALU Reg DmReg ALU Im Reg DmReg
17
CS152 / Kubiatowicz Lec11.17 3/3/04©UCB Spring 2004 Reg Dependencies backwards in time are hazards Can’t solve with forwarding: Must delay/stall instruction dependent on loads Forwarding (or Bypassing): What about Loads? Time (clock cycles) lw r1,0(r2) sub r4,r1,r3 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm ALU Im Reg DmReg
18
CS152 / Kubiatowicz Lec11.18 3/3/04©UCB Spring 2004 Reg Dependencies backwards in time are hazards Can’t solve with forwarding: Must delay/stall instruction dependent on loads Forwarding (or Bypassing): What about Loads Time (clock cycles) lw r1,0(r2) sub r4,r1,r3 IFIF ID/R F EXEX ME M WBWB ALU Im Reg Dm ALU Im Reg DmReg Stall
19
CS152 / Kubiatowicz Lec11.19 3/3/04©UCB Spring 2004 Recap: Data Hazards I-Fet ch DCD MemOpFetch OpFetch Exec Store IFetch DCD ° ° ° Structural Hazard I-Fet ch DCD OpFetch Jump IFetch DCD ° ° ° Control Hazard IF DCD EX Mem WB IF DCD OF Ex Mem RAW (read after write) Data Hazard WAW Data Hazard (write after write) IF DCD OF Ex RSWAR Data Hazard (write after read) IF DCD EX Mem WB
20
CS152 / Kubiatowicz Lec11.20 3/3/04©UCB Spring 2004 Designing a Pipelined Processor °Go back and examine your datapath and control diagram °associated resources with states °ensure that flows do not conflict, or figure out how to resolve conflicts °assert control in appropriate stage
21
CS152 / Kubiatowicz Lec11.21 3/3/04©UCB Spring 2004 Control and Datapath: Split state diag into 5 pieces IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Acces s Data Mem ABS Reg File Equal PC Next PC IR Inst. Mem DM
22
CS152 / Kubiatowicz Lec11.22 3/3/04©UCB Spring 2004 Pipelined Processor (almost) for slides °What happens if we start a new instruction every cycle? Exec Reg. File Mem Acces s Data Mem A B S M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D
23
CS152 / Kubiatowicz Lec11.23 3/3/04©UCB Spring 2004 Pipelined Datapath (as in book); hard to read
24
CS152 / Kubiatowicz Lec11.24 3/3/04©UCB Spring 2004 Administrivia °Midterm I on Wednesday 3/10 5:30 - 8:30 in 306 Soda Hall Bring a Calculator! One 8 1/2 by 11 page (both sides) of notes Make up exam: Tuesday 5:30 – 8:30 in 606 Soda Hall °Meet at LaVal’s afterwards for Pizza North-side Lavals! I’ll Buy! °Materials through Chapter 5, some of Chapter 6, Appendix A, B & C Should understand single-cycle processor multicycle processor, including exceptions °Get moving on Lab 3! This is a hard lab, since there are so many new things °Homework 4 out later today
25
CS152 / Kubiatowicz Lec11.25 3/3/04©UCB Spring 2004 Pipelining the Load Instruction °The five independent functional units in the pipeline datapath are: Instruction Memory for the Ifetch stage Register File’s Read ports (bus A and busB) for the Reg/Dec stage ALU for the Exec stage Data Memory for the Mem stage Register File’s Write port (bus W) for the Wr stage Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7 IfetchReg/DecExecMemWr1st lw IfetchReg/DecExecMemWr2nd lw IfetchReg/DecExecMemWr3rd lw
26
CS152 / Kubiatowicz Lec11.26 3/3/04©UCB Spring 2004 The Four Stages of R-type °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: ALU operates on the two register operands Update PC °Wr: Write the ALU output back to the register file Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecWrR-type
27
CS152 / Kubiatowicz Lec11.27 3/3/04©UCB Spring 2004 Pipelining the R-type and Load Instruction °We have pipeline conflict or structural hazard: Two instructions try to write to the register file at the same time! Only one write port Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type Ops! We have a problem!
28
CS152 / Kubiatowicz Lec11.28 3/3/04©UCB Spring 2004 Important Observation °Each functional unit can only be used once per instruction °Each functional unit must be used at the same stage for all instructions: Load uses Register File’s Write Port during its 5th stage R-type uses Register File’s Write Port during its 4th stage IfetchReg/DecExecMemWrLoad 12345 IfetchReg/DecExecWrR-type 1234 °2 ways to solve this pipeline hazard.
29
CS152 / Kubiatowicz Lec11.29 3/3/04©UCB Spring 2004 Solution 1: Insert “Bubble” into the Pipeline °Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle The control logic can be complex. Lose instruction fetch and issue opportunity. °No instruction is started in Cycle 6! Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExec IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWr R-type IfetchReg/DecExecWr R-type Pipeline Bubble IfetchReg/DecExecWr
30
CS152 / Kubiatowicz Lec11.30 3/3/04©UCB Spring 2004 Solution 2: Delay R-type’s Write by One Cycle °Delay R-type’s register write by one cycle: Now R-type instructions also use Reg File’s write port at Stage 5 Mem stage is a NOOP stage: nothing is being done. Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/Dec Exec WrR-type Mem Exec 123 4 5
31
CS152 / Kubiatowicz Lec11.31 3/3/04©UCB Spring 2004 Modified Control & Datapath IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– M; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– M; S <– A + SX; Mem[S] <- B if Cond PC < PC+SX; M <– S Exec Reg. File Mem Acces s Data Mem AB S Reg File Equal PC Next PC IR Inst. Mem DM M <– S
32
CS152 / Kubiatowicz Lec11.32 3/3/04©UCB Spring 2004 The Four Stages of Store °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: Calculate the memory address °Mem: Write the data into the Data Memory Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemStoreWr
33
CS152 / Kubiatowicz Lec11.33 3/3/04©UCB Spring 2004 The Three Stages of Beq °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: compares the two register operand, select correct branch target address latch into PC Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemBeqWr
34
CS152 / Kubiatowicz Lec11.34 3/3/04©UCB Spring 2004 Control Diagram IR <- Mem[PC]; PC < PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Acces s Data Mem AB S Reg File Equal PC Next PC IR Inst. Mem D M <– S M
35
CS152 / Kubiatowicz Lec11.35 3/3/04©UCB Spring 2004 Recall: Single cycle control! Data Out Clk 5 RwRaRb 32 32-bit Registers Rd ALU Clk Data In Data Address Ideal Data Memory Instruction Address Ideal Instruction Memory Clk PC 5 Rs 5 Rt 32 A B Next Address Control Datapath Control Signals Conditions
36
CS152 / Kubiatowicz Lec11.36 3/3/04©UCB Spring 2004 Data Stationary Control °The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/DecExecMem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemWr Wr
37
CS152 / Kubiatowicz Lec11.37 3/3/04©UCB Spring 2004 Datapath + Data Stationary Control Exec Reg. File Mem Acces s Data Mem ABS Reg File PC Next PC IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt op rs rt fun im ex me wb rw v me wb rw v wb rw v
38
CS152 / Kubiatowicz Lec11.38 3/3/04©UCB Spring 2004 Let’s Try it Out 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 these addresses are octal
39
CS152 / Kubiatowicz Lec11.39 3/3/04©UCB Spring 2004 Start: Fetch 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 IF PC Next PC 10 = nnnn
40
CS152 / Kubiatowicz Lec11.40 3/3/04©UCB Spring 2004 Fetch 14, Decode 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt im 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1, r2(35) ID IF PC Next PC 14 = nnn
41
CS152 / Kubiatowicz Lec11.41 3/3/04©UCB Spring 2004 Fetch 20, Decode 14, Exec 10 Exec Reg. File Mem Acces s Data Mem r2 BS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt 35 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 addI r2, r2, 3 ID IF EX PC Next PC 20 = n n
42
CS152 / Kubiatowicz Lec11.42 3/3/04©UCB Spring 2004 Fetch 24, Decode 20, Exec 14, Mem 10 Exec Reg. File Mem Acces s Data Mem r2 B r2+35 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 45 3 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 sub r3, r4, r5 addI r2, r2, 3 ID IF EX M PC Next PC 24 = n
43
CS152 / Kubiatowicz Lec11.43 3/3/04©UCB Spring 2004 Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 Exec Reg. File Mem Acces s Data Mem r4 r5 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M[r2+35] 67 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 beq r6, r7 100 addI r2 sub r3 ID IF EX M WB PC Next PC 30 = Note Delayed Branch: always execute ori after beq
44
CS152 / Kubiatowicz Lec11.44 3/3/04©UCB Spring 2004 Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 Exec Reg. File Mem Acces s Data Mem r6 r7 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 9xx 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 beq addI r2 sub r3 r4-r5 100 ori r8, r9 17 ID IF EX M WB PC Next PC 100 =
45
CS152 / Kubiatowicz Lec11.45 3/3/04©UCB Spring 2004 Fetch 104, Dcd 100, Ex 30, Mem 24, WB 20 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 ID EX M WB PC Next PC ___ = Fill it in yourself! ?
46
CS152 / Kubiatowicz Lec11.46 3/3/04©UCB Spring 2004 Fetch 110, Dcd 104, Ex 100, Mem 30, WB 24 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 EX M WB PC Next PC ___ = Fill it in yourself! ?? ? ? ?
47
CS152 / Kubiatowicz Lec11.47 3/3/04©UCB Spring 2004 Fetch 114, Dcd 110, Ex 104, Mem 100, WB 30 Exec Reg. File Mem Acces s Data Mem Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl 10lw r1, r2(35) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, 100 30orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 M WB PC Next PC ___ = Fill it in yourself! ?? ? ? ? ?
48
CS152 / Kubiatowicz Lec11.48 3/3/04©UCB Spring 2004 Pipelined Processor °Separate control at each stage °Stalls propagate backwards to freeze previous stages °Bubbles in pipeline introduced by placing “Noops” into local stage, stall previous stages. Exec Reg. File Mem Acces s Data Mem A B S M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D Stalls Bubbles
49
CS152 / Kubiatowicz Lec11.49 3/3/04©UCB Spring 2004 Recap: Data Hazards °Avoid some “by design” eliminate WAR by always fetching operands early (DCD) in pipe eleminate WAW by doing all WBs in order (last stage, static) °Detect and resolve remaining ones stall or forward (if possible) IF DCD EX Mem WB IF DCD OF Ex Mem RAW Data Hazard WAW Data Hazard IF DCD OF Ex RSWAR Data Hazard IF DCD EX Mem WB
50
CS152 / Kubiatowicz Lec11.50 3/3/04©UCB Spring 2004 Is CPI = 1 for our pipeline? °Remember that CPI is an “Average # cycles/inst °CPI here is 1, since the average throughput is 1 instruction every cycle. °What if there are stalls or multi-cycle execution? °Usually CPI > 1. How close can we get to 1?? IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB
51
CS152 / Kubiatowicz Lec11.51 3/3/04©UCB Spring 2004 Summary °What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores °Hazards limit performance Structural: need more HW resources Data: need forwarding, compiler scheduling Control: early evaluation & PC, delayed branch, prediction °Data hazards must be handled carefully: RAW data hazards handled by forwarding WAW and WAR hazards don’t exist in 5-stage pipeline °MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load)
52
CS152 / Kubiatowicz Lec11.52 3/3/04©UCB Spring 2004 Summary: Where this class is going °We’ll build a simple pipeline and look at these issues Lab 4 Pipelined Processor Lab 5 With caches °We’ll talk about modern processors and what’s really hard: Branches (control hazards) are really hard! Exception handling Trying to improve performance with out-of-order execution, etc. Trying to get CPI < 1 (Superscalar execution)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.