Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE232: Hardware Organization and Design

Similar presentations


Presentation on theme: "ECE232: Hardware Organization and Design"— Presentation transcript:

1 ECE232: Hardware Organization and Design
Part 12: Pipelining II Chapter 4 (6 in 3rd edition) Other handouts Course schedule with due dates To handout next time HW#1 Combinations to AV system, etc (1988 in 113 IST) Call AV hot line at

2 Benefits of forwarding
Consider the following stretch of assembly code executed on a pipelined implementation of MIPS Handling Branches:  No branch prediction, conditional branches resolved in EX stage  All fetches following a conditional branch flushed until branch resolved - No speculative fetching How long does it take to execute? addi $to, $t1, 40 Loop: lw $t2, 0($t1) addi $t2, $t2, 3 sw $t2, 0($t1) addi $t1, $t1, 4 bne $t0, $t1, Loop

3 No forwarding (loop executed 10 times)
ALU IMem Reg DMem addi $to, $t1, 40 Loop: lw $t2, 0($t1) addi $t2, $t2, 3 sw $t2, 0($t1) addi $t1, $t1, 4 bne $t0, $t1, Loop 1 2 3 4 5 6 7 8 9 10 11 12 lw IF ID EX M WB addi S sw bne 13 * = 131 cycles

4 With forwarding addi $to, $t1, 40 Loop: lw $t2, 0($t1)
addi $t2, $t2, 3 sw $t2, 0($t1) addi $t1, $t1, 4 bne $t0, $t1, Loop 1 2 3 4 5 6 7 8 9 10 11 12 lw IF ID EX M WB addi S sw bne 9* = 91 cycles speedup = 131/91=1.44

5 Avoiding Hazard by Reordering Code
How you would reorder the stretch of code after the first addi and before bne instruction to make it run faster? addi $to, $t1, 40 Loop: lw $t2, 0($t1) addi $t2, $t2, 3 sw $t2, 0($t1) addi $t1, $t1, 4 bne $t0, $t1, Loop

6 MIPS Pipeline Datapath Modifications
State registers between each pipeline stage to isolate them Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 16 32 ALU Shift left 2 Data IFetch/Dec Dec/Exec Exec/Mem Mem/WB IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WriteBack System Clock Sign Extend

7 Corrected Datapath to Save RegWrite Addr
Need to preserve the destination register address in the pipeline state registers Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 16 32 ALU Shift left 2 Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB

8 MIPS Pipeline Control Path Modifications
All control signals can be determined during Decode and held in the state registers between pipeline stages Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 16 32 ALU Shift left 2 Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control

9 Control Settings lw sw beq EX Stage MEM Stage WB Stage Reg Dst ALU Op1
ALU Src Brch Mem Read Mem Write Reg Write Mem toReg R 1 lw sw X beq ALU IMem Reg DMem

10 Control Signals’ propagation

11 Pipeline Stages' Registers

12 Forwarding add $1,… sub $4,$1,$5 and $6,$7,$1 or $8,$1,$1 sw $4,4($1)
ALU IM Reg DM add $1,… I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$7,$1 ALU IM Reg DM or $8,$1,$1 ALU IM Reg DM sw $4,4($1)

13 Data Forwarding (aka Bypassing)
Take the result from the point that it exists in any of the pipeline state registers and forward it to the functional unit (e.g., the ALU) that needs it that cycle For ALU functional unit: the inputs can come from any pipeline register rather than just from ID/EX by add multiplexors to the inputs of the ALU connect the result data in EX/MEM or MEM/WB to both of the EX’s stage Rs and Rt ALU mux inputs add the proper control hardware to control the new muxes Other functional units may need similar forwarding logic (e.g., the DMem) With forwarding can achieve a CPI of almost 1 even in the presence of data dependencies ALU IMem Reg DMem

14 Datapath with Forwarding Hardware
PCSrc ID/EX EX/MEM Control IF/ID Add MEM/WB Branch Add 4 Shift left 2 Instruction Memory Read Addr 1 Data Memory Register File Read Data 1 Read Addr 2 Read Address PC Read Data Address Write Addr ALU Read Data 2 Write Data Write Data ALU cntrl 16 32 That is, any computer, no matter how primitive or advance, can be divided into five parts: 1. The input devices bring the data from the outside world into the computer. 2. These data are kept in the computer’s memory until ... 3. The datapath request and process them. 4. The operation of the datapath is controlled by the computer’s controller. All the work done by the computer will NOT do us any good unless we can get the data back to the outside world. 5. Getting the data back to the outside world is the job of the output devices. The most COMMON way to connect these 5 components together is to use a network of busses. Sign Extend Forward Unit

15 Data Forwarding Control Conditions
EX/MEM hazard: if (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 Forwards the result from the previous instr. to either input of the ALU MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 Forwards the result from the second previous instr. to either input of the ALU That is, any computer, no matter how primitive or advance, can be divided into five parts: 1. The input devices bring the data from the outside world into the computer. 2. These data are kept in the computer’s memory until ... 3. The datapath request and process them. 4. The operation of the datapath is controlled by the computer’s controller. All the work done by the computer will NOT do us any good unless we can get the data back to the outside world. 5. Getting the data back to the outside world is the job of the output devices. The most COMMON way to connect these 5 components together is to use a network of busses.

16 Datapath with Forwarding Hardware - 1
PCSrc Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 16 32 ALU Shift left 2 Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control cntrl Branch Forward Unit That is, any computer, no matter how primitive or advance, can be divided into five parts: 1. The input devices bring the data from the outside world into the computer. 2. These data are kept in the computer’s memory until ... 3. The datapath request and process them. 4. The operation of the datapath is controlled by the computer’s controller. All the work done by the computer will NOT do us any good unless we can get the data back to the outside world. 5. Getting the data back to the outside world is the job of the output devices. The most COMMON way to connect these 5 components together is to use a network of busses. EX/MEM.RegisterRd MEM/WB.RegisterRd ID/EX.RegisterRt ID/EX.RegisterRs

17 Datapath with Forwarding Hardware - 2
PCSrc Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 16 32 ALU Shift left 2 Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control cntrl Branch Forward Unit That is, any computer, no matter how primitive or advance, can be divided into five parts: 1. The input devices bring the data from the outside world into the computer. 2. These data are kept in the computer’s memory until ... 3. The datapath request and process them. 4. The operation of the datapath is controlled by the computer’s controller. All the work done by the computer will NOT do us any good unless we can get the data back to the outside world. 5. Getting the data back to the outside world is the job of the output devices. The most COMMON way to connect these 5 components together is to use a network of busses. EX/MEM.RegisterRd MEM/WB.RegisterRd ID/EX.RegisterRt ID/EX.RegisterRs

18 Summary All modern day processors use pipelining
Pipelining doesn’t help latency of single task, it helps throughput of entire workload Potential speedup: a CPI of 1 Pipeline clock cycle determined/limited by slowest pipeline stage Unbalanced pipe stages cause inefficiencies The time to “fill” pipeline and time to “drain” it can impact speedup for deep pipelines and short code runs Must detect and resolve hazards Stalling negatively affects CPI (makes CPI less than the ideal of 1)

19 Review: Pipeline Hazards
Structural hazards Design pipeline to eliminate structural hazards Data hazards – read after write - RAW Use data forwarding inside the pipeline For those cases that forwarding won’t solve (e.g., load-use) include hazard hardware to insert stalls/bubbles Control hazards – beq, bne,j,jr,jal Stall – hurts performance Move decision point as early in the pipeline as possible – reduces number of stalls at the cost of additional hardware Delay decision (requires compiler support) – “Delayed Branch” Predict outcome of Branch Static prediction – e.g., always not-taken Dynamic prediction – prediction per branch in program That is, any computer, no matter how primitive or advance, can be divided into five parts: 1. The input devices bring the data from the outside world into the computer. 2. These data are kept in the computer’s memory until ... 3. The datapath request and process them. 4. The operation of the datapath is controlled by the computer’s controller. All the work done by the computer will NOT do us any good unless we can get the data back to the outside world. 5. Getting the data back to the outside world is the job of the output devices. The most COMMON way to connect these 5 components together is to use a network of busses.

20 Extracting Yet More Performance
Two options: Increase the depth of the pipeline to increase the clock rate – superpipelining Fetch (and execute) more than one instructions at one time (expand every pipeline stage to accommodate multiple instructions) – multiple-issue Launching multiple instructions per stage allows the instruction execution rate, CPI, to be less than 1 So instead we use IPC: instructions per clock cycle E.g., a 3 GHz, four-way multiple-issue processor can execute at a peak rate of 12 billion instructions per second with a best case CPI of or a best case IPC of 4 If the datapath has a five stage pipeline, how many instructions are active in the pipeline at any given time? That is, any computer, no matter how primitive or advance, can be divided into five parts: 1. The input devices bring the data from the outside world into the computer. 2. These data are kept in the computer’s memory until ... 3. The datapath request and process them. 4. The operation of the datapath is controlled by the computer’s controller. All the work done by the computer will NOT do us any good unless we can get the data back to the outside world. 5. Getting the data back to the outside world is the job of the output devices. The most COMMON way to connect these 5 components together is to use a network of busses.


Download ppt "ECE232: Hardware Organization and Design"

Similar presentations


Ads by Google