Download presentation
Presentation is loading. Please wait.
Published byBeatrix Stokes Modified over 8 years ago
1
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 1 ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2016 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 http://www.eng.auburn.edu/~vagrawal vagrawal@eng.auburn.edu
2
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 2 Add Reg. File Data mem. 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 opcode ALU zero 0-15 16-20 21-25 26-31 ALU Pipelined Datapath (without Jump) IF/IDID/EXEX/MEMMEM/WB 16-20 for I-type lw 11-15 for R-type 1 mux 0 Instr mem PC
3
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 3 PC Add Reg. File Data mem. 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 opcode ALU zero 0-15 16-20 21-25 26-31 ALU Mem. and Reg. File Need Controls IF/IDID/EXEX/MEMMEM/WB 16-20 for I-type lw 11-15 for R-type 1 mux 0 Instr mem RegWrite MemWrite MemRead
4
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 4 PC Add Reg. File Data mem. 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 opcode ALU zero 0-15 16-20 21-25 26-31 ALU Multiplexers Need Controls IF/IDID/EXEX/MEMMEM/WB 16-20 for I-type lw 11-15 for R-type 1 mux 0 Instr mem RegWrite MemWrite MemRead ALUSrc RegDst Branch PCSrc MemtoReg
5
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 5 PC Add Reg. File Data mem. 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 opcode ALU zero 0-15 16-20 21-25 26-31 ALU ALU Needs a Control IF/IDID/EXEX/MEMMEM/WB 16-20 for I-type lw 11-15 for R-type 1 mux 0 Instr mem RegWrite MemWrite MemRead ALUSrc RegDst Branch PCSrc MemtoReg ALU cont. ALUOp 0-5
6
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 6 Compare with Single-Cycle Control Control signals are the same as those needed for a single-cycle datapath. Control signals are generated using the Opcode in the ID (instruction decode) cycle and then distributed to other cycles. Let us reexamine the implementation of the single-cycle control (slides 19-21 of Lecture 5).
7
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 7 Hardwired CU: Single-Cycle Implemented by combinational logic. Control logic opcode Datapath ALU control 3 2 To ALU ALUOp Control signals funct. code 6 6
8
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 8 Instr. mem. PC Add Reg. File Data mem. 1 mux 0 0 mux 1 4 1 mux 0 Sign ext. Shift left 2 ALU Cont. CONTROL opcode MemWrite MemRead ALU Branch zero 0-15 0-5 11-15 16-20 21-25 26-31 ALU 0 mux 1 Shift left 2 0-25 Jump Single-cycle datapath MemtoReg ALUOp ALUSrc RegDst RegWrite
9
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 9 Single-Cycle Control Logic InputsOutputs Instr.typeOpcode Instruction bits 31 31 29 28 27 26 R0000001001000100 lw lw1000110111100000 sw sw101011X1X0010000 beq000100X0X0001010 J000010XXX0X0XXX1 ALUOp0 ALUOp1 RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch Jump
10
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 10 Single-Cycle Control Circuit lw swbeqJR RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0 Jump Op5 Op4 Op3 Op2 Op1 Op0
11
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 11 ALU Control Logic Inputs Outputs to ALU Instr.type From CU Funct. Code from IR (bits 0-5) 3-bit code Opera- tion ALUOp1ALUOp0F5F4F3F2F1F0 lw, sw 00XXXXXX010Add B01XXXXXX110Subtract R 1XXX0000010Add 1XXX0010110Subtract 1XXX0100000AND 1XXX0101001OR 1XXX1010111 slt slt
12
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 12 ALU Control ALU 3 zero result overflow Operation select from control Operation selectALU function 000AND 001OR 010Add 110Subtract 111Set on less than F3 F2 F1 F0 ALUOp1 ALUOp0 From Control Circuit ALU control
13
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 13 Returning to Pipelined Control Opcode input to control is supplied by the pipeline register IF/ID in the ID (instruction decode) cycle. Nine control signals are generated in the ID cycle, but none is used. They are saved in the pipeline register ID/EX. ALUSrc, RegDst and ALUOp (2 bits) are used in the EX (execute) cycle. Remaining 5 control signals are saved in the pipeline register EX/MEM. Branch, MemWrite and MemRead are used in the MEM (memory access) cycle. Remaining 2 control signals are saved in the pipeline register MEM/WB. MemtoReg and RegWrite are used in the WB (write back) cycle. Pipelined control is shown without Jump.
14
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 14 PC Add Reg. File Data mem. 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 opcode ALU zero 0-15 16-20 21-25 26-31 ALU Placing Control in Pipelined Datapath IF/IDID/EXEX/MEMMEM/WB 16-20 for I-type lw 11-15 for R-type 1 mux 0 Instr mem RegWrite MemWrite MemRead ALUSrc RegDst Branch PCSrc MemtoReg ALU cont. ALUOp CONTROL 0-5 IF/IDID/EX
15
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 15 PC Add Reg. File Data mem 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 opcode ALU zero 0-15 16-20 21-25 26-31 ALU Highlighted Pipelined Control IF/IDID/EXEX/MEMMEM/WB 16-20 for I-type lw 11-15 for R-type 1 mux 0 Instr mem RegWrite MemWrite MemRead ALUSrc RegDst Branch PCSrc MemtoReg ALU cont. ALUOp CONTROL 0-5
16
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 16 Single-Cycle Performance Assume 200 ps for memory access 100 ps for ALU operation 50 ps for register file read or write Cycle time set according to longest instruction: lw ≡ IF + ID/RegRead + ALU + MEM + RegWrite = 200 + 50 +100 + 200 + 50 = 600 ps Av. instruction execution time= clock cycle time = 600 ps
17
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 17 Multicycle Performance Consider SPECINT2000* instruction mix: 25% lw5 cycles 10% sw4 cycles 11% branch3 cycles 2% jump3 cycles 52% ALU instr.4 cycles Av. CPI= 0.25×5 + 0.10×4 + 0.11×3 + 0.02×3 + 0.52×4 = 4.12 Clock cycle time determined from longest operation (memory access) = 200 ps Av. instruction execution time = 4.12×200 = 824 ps * Set of benchmark programs used for performance evaluation, to be discussed in a later lecture.
18
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 18 Pipeline Performance Neglect initial latency (reasonable for long programs). One instruction completed every clock cycle unless delayed by hazard. Average CPI: lw2 cycles in 50% cases due to hazard1.5 cycles lw2 cycles in 50% cases due to hazard1.5 cycles sw1 cycle sw1 cycle ALU1 cycle ALU1 cycle branch2 cycles in 25% cases due to hazard 1.25 cycles branch2 cycles in 25% cases due to hazard 1.25 cycles jump2 cycles jump2 cycles For SPECINT2000 Av. CPI= 0.25×1.5 + 0.10×1 + 0.11×1.25 + 0.02×2.0 + 0.52×1 = 1.17 Clock cycle time (longest operation: memory access) = 200 ps Av. instruction execution time = 1.17×200 = 234 ps
19
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 19 Comparing Alternatives Type of datapath and control Clock cycle time Average CPI Av. instruction execution time Single- cycle 600 ps 1.00 Multicycle 200 ps 4.12 824 ps Pipelined 200 ps 1.17 234 ps
20
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 20 Other Controls for Pipeline ForwardingStall Branch hazard and branch prediction Instruction flush Exceptions
21
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 21 Forwarding Consider a data hazard: sub$2, $1, $3# computes result in CC3, writes in $2 in CC5 and$12, $2, $5# reads $2 in CC3, adds in CC4 IF: IM ID: REG. FILE READ EX: ALU MEM: DM WB : REG. WRITE IF/ID ID/EX MEM/WB EX/MEM CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 sub $2, $1, $3 and $12, $2, $5 CC3: ALU saves new data in EX/MEM, to be written to $2 in CC5 IF: IM ID: REG. FILE READ EX: ALU MEM: DM WB : REG. WRITE IF/IDID/EX MEM/WB EX/MEM CC3: and reads $2 to ID/EX, but the correct data is in EX/MEM CC4: forwarding allows execution of “and” with correct data
22
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 22 Understanding Forwarding Let’s ask following questions: Q: Why is there a hazard? A: Source register for the present instruction is the same as the destination register of the previous instruction. Q: When is the source register data needed? A: In the execute cycle (CC4). Q: Is source register data available in CC4? A: Yes – use forwarding. No – use stall. Q:Where is the required data in CC4? A: In the pipeline register EX/MEM as ALU output.
23
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 23 Forwarding Hardware A forwarding unit is added to execute (ALU) cycle hardware. Functions of forwarding unit: –Hazard detection –Forward correct data to ALU Inputs to forwarding unit: –Source registers of the instruction in EX –Destination registers of instructions in DM and WB Outputs of forwarding unit: multiplexer controls to route correct data to the ALU.
24
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 24 Recall Register Definitions R-type instruction (add, sub, and, or,... ) opcodeRsRtRdshamtfunct I-type instruction (beq, lw, sw, addi,... ) opcodeRsRtconstant_or_address J-type instruction (j, jal, jr) opcodea___d___d___r___e___s___s where Rs is the first source register Rt is the second source register Rd is the destination register
25
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 25 Reg. File Data mem. 1 mux 0 0 mux 1 Sign ext. Shift left 2 opcode ALU zero 0-15 16-20 21-25 26-31 ALU 21-25 IF/ID ID/EXEX/MEMMEM/WB 16-20 11-15 1 mux 0 Forwarding unit MUX Branch addr. Addr mem PC+4 16-20 Rs Rt Forwarding Implemented Rd
26
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 26 Stall Delay next instruction by sending nop through pipeline. Necessary when hazard not resolved by forwarding. IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM CC1 CC2 CC3 CC4 CC5 CC6 lw $2, 20($1) and $4, $2, $5 CC4: new data in MEM/WB, to be written to $2 CC4: execution of and is impossible; correct data unavailable until end of CC4
27
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 27 Detecting Hazard Requiring Stall Consider instruction in IF/ID being decoded: If Previous instruction (lw) activated MemRead, and Instruction being decoded has a source register (Rs or Rt) same as the destination register (Rt for lw) of the previous instruction Then, stall the pipeline: Force all control outputs to 0 Prevent PC from changing Prevent IF/ID from changing
28
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 28 Stall Implementation Data mem. 0 mux 1 Sign ext. Shift left 2 opcode zero 0-15 16-20 21-25 26-31 21-25 IF/ID ID/EXEX/MEMMEM/WB 16-20 11-15 1 mux 0 MUX 16-20 Rs Rt Rd 1 mux 0 ALU Addr mem Hazard detection unit Control 0 MemRead PCWrite IF/IDWrite PC MUX Reg. File Forwarding unit Rs Rt
29
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 29 Stall Execution with stall and forwarding: CC1 CC2 CC3 CC4 CC5 CC6 CC7 lw $2, 20($1) and $4, $2, $5 CC4: new data in MEM/WB, to be written to $2 IF: IM ID: REG. FILE READ EX: ALU MEM: DM WB : REG. WRITE IF/ID ID/EX MEM/WB EX/MEM IF: IM IF/ID IF: IM ID: REG. FILE READ EX: ALU MEM: DM WB : REG. WRITE IF/ID ID/EX MEM/WB EX/MEM next ID: REG. FILE READ EX: ALU MEM: DM WB : REG. WRITE IF/ID ID/EX MEM/WB EX/MEM IF: IM ID: REG. FILE READ EX: ALU MEM: DM WB : REG. WRITE IF/ID ID/EX MEM/WB EX/MEM State of IF/ID is frozen in CC3 bubble (nop) next is fetched twice since PC was frozen
30
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 30 Branch Hazard Consider heuristic – branch not taken. Continue fetching instructions in sequence following the branch instructions. If branch is taken (indicated by zero output of ALU): –Control generates branch signal in ID cycle. –branch activates PCSource signal in the MEM cycle to load PC with new branch address. –Three instructions in the pipeline must be flushed if branch is taken – can this penalty be reduced?
31
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 31 PC Add Reg. File Data mem. 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 opcode ALU zero 0-15 16-20 21-25 26-31 ALU Branch Hazard IF/IDID/EXEX/MEMMEM/WB 16-20 for I-type lw 11-15 for R-type 1 mux 0 Instr mem RegWrite MemWrite MemRead ALUSrc RegDst Branch PCSrc MemtoReg ALU cont. ALUOp0-5 beq CONTROL
32
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 32 Branch Not Taken Branch fetchedBranch decodedBranch decisionPC keeps D (br. not taken) A fetchedA decodedA executedA continues B fetchedB decodedB executed C fetchedC decoded D fetched cycle b cycle b+1 cycle b+2 cycle b+3cycle b+4 Branch on condition to Z A B C D Z
33
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 33 Branch Taken Branch fetchedBranch decodedBranch decisionPC gets Z (br. taken) A fetchedA decodedA executedNop B fetchedB decodedNop C fetchedNop Z fetched cycle b cycle b+1 cycle b+2 cycle b+3cycle b+4 Branch on condition to Z A B C D Z Three instructions are flushed if branch is taken Three-cycle penalty
34
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 34 PC Add Reg. File Data mem. 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 opcode ALU zero 0-15 16-20 21-25 26-31 Add Branch Penalty Reduction IF/IDID/EXEX/MEMMEM/WB 16-20 for I-type lw 11-15 for R-type 1 mux 0 Instr mem RegWrite MemWrite MemRead ALUSrc RegDst Branch PCSrc MemtoReg ALU cont. ALUOp0-5 beq CONTROL
35
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 35 Branch Taken Branch fetchedBranch decision PC gets Z A fetchedA flushedNopNop Z fetchedZ decodedZ executed cycle b cycle b+1 cycle b+2 cycle b+3cycle b+4 Branch to Z A B C D Z One instructions is flushed if branch is taken One-cycle penalty
36
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 36 Pipeline Flush If branch is taken (as indicated by zero), then control does the following: –Change all control signals to 0, similar to the case of stall for data hazard, i.e., insert bubble in the pipeline. –Generate a signal IF.Flush that changes the instruction in the pipeline register IF/ID to 0 (nop). Penalty of branch hazard is reduced by –Adding branch detection and address generation hardware in the decode cycle – one bubble needed – a next address generation logic in the decode stage writes PC+4, branch address, or jump address into PC. –Using branch prediction. –Unrolling loops.
37
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 37 Branch Prediction Useful for program loops. A one-bit prediction scheme: a one-bit buffer carries a “history bit” that tells what happened on the last branch instruction History bit = 1, branch was taken History bit = 0, branch was not taken Predict branch not taken 0 Predict branch taken 1 taken Not taken
38
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 38 Branch Prediction = Prediction Logic 0 1 PC+4Next PC PC Low-order bits used as index Address of Target History recent branch addresses bit(s) instructions
39
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 39 Branch Prediction for a Loop I = 0 I = I + 1 I – 10 = 0? Store X in memory X = X + R(I) Y N abcdeabcde Execu -tion seq. Oldhist.bit Next instr. New hist. bit Predi ction Pred.IAct. 10e1b1Bad 21b2b1Good 31b3b1Good 41b4b1Good 51b5b1Good 61b6b1Good 71b7b1Good 81b8b1Good 91b9b1Good 101b10e0 Execution of Instruction d h.bit = 0 branch not taken, h.bit = 1 branch taken.
40
Prediction Accuracy One-bit predictor: 2 errors out of 10 predictions Prediction accuracy = 80% To improve prediction accuracy, use two- bit predictor: A prediction must be wrong twice before it is changed Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 40
41
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 41 Two-Bit Prediction Buffer Implemented as a two-bit counter. Can improve correct prediction statistics. Predict branch not taken 00 Predict branch taken 10 Predict branch taken 11 Predict branch not taken 01 taken Not taken
42
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 42 Branch Prediction for a Loop I = 0 I = I + 1 I – 10 = 0? Store X in memory X = X + R(I) Y N 1234512345 Execu -tion seq. Old Pred. Buf Next instr. New pred. Buf Predi ction Pred.IAct. 11021211Good 21122211Good 31123211Good 41124211Good 51125211Good 61126211Good 71127211Good 81128211Good 91129211Good 1011210510Bad Execution of Instruction 4
43
Spr 2016, Mar 9... ELEC 5200-001/6200-001 Lecture 7 43 Exceptions A typical exception occurs when ALU produces an overflow signal. Control asserts following actions on exception: –Change the PC address to 4000 0040hex. This is the location of the exception routine. This is done by adding an additional input to the PC input multiplexer. –Overflow is detected in the EX cycle. Similar to data hazard and pipeline flush, Set IF/ID to 0 (nop). Generate ID.Flush and EX.Flush signals to set all control signals to 0 in ID/EX and EX/MEM registers. This also prevents the ALU result (presumed contaminated) from being written in the WB cycle.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.