Download presentation
Presentation is loading. Please wait.
Published byArron Stephens Modified over 9 years ago
1
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived from material in the text (Chap. 3). All figures from Computer Architecture: A Quantitative Approach, Second Edition, by John Hennessy and David Patterson, are copyrighted material (COPYRIGHT 1996 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED).
2
Oct. 18, 2000Machine Organization2 Introduction Objective: To understand pipelining and the enhanced performance it provides Pipelining is an implementation technique in which multiple instructions are overlapped in execution. Instructions are broken down into stages and while one instruction is executing one stage another instruction can simultaneously execute another stage. Topics –Review DLX –Simple Implementation of DLX –Basic Pipeline for DLX –Pipeline hazards –Floating point pipeline
3
Oct. 18, 2000Machine Organization3 Instruction Format R-type Instruction (register format - add, sub, …) I-type Instruction (immediate format - load, store, branch, immediate) J-type Instruction (jump, jal) op rs rt rd func op rs rt Immediate op offset added to PC
4
Oct. 18, 2000Machine Organization4 Implementation Stages Instruction Fetch Cycle (IF) –IR Mem[PC] –NPC PC + 4 Instruction Decode/Register Fetch Cycle (ID) –A Regs[IR 6..10 ] –B Regs[IR 11..15 ] –Imm ((IR 16 ) 16 ## IR 16..31 )
5
Oct. 18, 2000Machine Organization5 Implementation Stages Execution/Effective Address Cycle (EX) –Memory Reference: ALUOutput A + Imm; –Register-Register ALU Instruction: ALUOutput A func B; –Register-Immediate ALU Instruction: ALUOutput A op Imm; –Branch: ALUOutput NPC + Imm; Cond (A op 0) ;
6
Oct. 18, 2000Machine Organization6 Implementation Stages Memory Access/Branch Completion Cycle (MEM) –Memory Reference: LMD Mem[ALUOutput]; or Mem[ALUOutput] B; –Branch: if (Cond) PC ALUOutput; Write-back Cycle (WB) –Register-Register ALU Instruction: Regs[IR 16..20 ] ALUOutput; –Register-Immediate ALU Instruction: Regs[IR 11..15 ] ALUOutput; –Load Instruction: Regs[IR 11..15 ] LMD;
7
Oct. 18, 2000Machine Organization7 DLX Datapath
8
Oct. 18, 2000Machine Organization8 Simple DLX Pipeline Each stage (clock-cycle) becomes a pipeline stage Overlap execution of instructions Add registers between stages
9
Oct. 18, 2000Machine Organization9 Overlap of Functional Units
10
Oct. 18, 2000Machine Organization10 Pipelined Datapath
11
Oct. 18, 2000Machine Organization11 Pipeline Performance Expect speedup equal to the number of pipe stages –assumes equal sized tasks –no additional overhead due to pipelining Speedup from pipelining (reduce CPI or decrease clock) = Avg. inst. Ex. time unpipelined/ Avg. inst. Ex. Time pipelined Example: 10 ns clock without pipelining, 11 ns with pipelining (account for overhead). ALU (40%), Branch (20%) take 4 cycles, Memory (20%) takes 5. Speedup = 10 ns ((.4 +.2) 4 +.2 5)/ 11 ns = 44/11 = 4
12
Oct. 18, 2000Machine Organization12 Pipeline Hazards Situations in pipelining when the next instruction cannot execute in the following clock cycle Structural hazards – hardware can not support the combination of instructions that we want to execute in the same cycle Control hazards –need to make a decision based on the results of one instruction while others are executing Data hazards –an instruction depends on a the results of a previous instruction still in the pipeline
13
Oct. 18, 2000Machine Organization13 Pipeline Performance II Must account for hazards –Hazards introduce stall cycles in the pipeline = Avg. inst. Ex. time unpipelined/ Avg. inst. Ex. Time pipelined = CPI unpipelined Clock cycle unpipelined / CPI pipelined Clock cycle pipelined = CPI unpipelined/(1 + Pipeline stall cycles per. Inst.) Clock cycle unpipelined/Clock cycle pipelined Pipeline Depth/(1 + Pipeline stall cycles per. Inst.)
14
Oct. 18, 2000Machine Organization14 Structural Hazards Problem: conflict in resources Example: Suppose that instruction and data memory was shared in single-cycle pipeline. Data access conflicts with instruction fetch Solution: remove conflicting stages, redesign resources to separate resources, or replicate resources
15
Oct. 18, 2000Machine Organization15 Structural Hazard
16
Oct. 18, 2000Machine Organization16 Data Hazards Problem: Instruction depends on the result of a previous instruction still in the pipeline Example: –add R1, R2, R3 –sub R5, R1, R4 Solutions: –forwarding or bypassing –instruction reordering to remove dependencies
17
Oct. 18, 2000Machine Organization17 Data Hazard Example –add R1, R2, R3 –sub R4, R1, R5 –and R6, R1, R7 –or R8, R1, R9 –xor R10, R1, R11
18
Oct. 18, 2000Machine Organization18 Data Dependencies
19
Oct. 18, 2000Machine Organization19 Data Forwarding
20
Oct. 18, 2000Machine Organization20 Implementing Forwarding Detection –e.g. EX/MEM.IR 16..20 = ID/EX 6..10 Use multiplexor to select forwarded results
21
Oct. 18, 2000Machine Organization21 Data Hazard with Stall –lw R1, 0(R2) –sub R4, R1, R5 –and R6, R1, R7 –or R8, R1, R9
22
Oct. 18, 2000Machine Organization22 Compiler Scheduling for Data Hazards Data hazards are naturally generated –C = A + B lw R1, A lw R2, B add R3, R1, R2 sw C, R3 Compiler can reorder instructions to remove dependencies –a = b + c; d = e - f; lw R1, b lw R2, c lw R3, e add R5, R1, R2 lw R4, f sw a, R5 sub R6, R3, R4 sw d, R6
23
Oct. 18, 2000Machine Organization23 Effectiveness of Scheduling
24
Oct. 18, 2000Machine Organization24 Control Hazards Problem: The next element to go into the pipe may depend on currently executing instruction or we may have to wait until a stage is completed to determine the next stage Example: branch instruction Solutions: –Stall - operate sequentially until decision can be made (wastes time) –Predict - guess what to do next. If guess correct, operate normally, if guess is wrong clear the pipe and begin again –Compute address of branch target earlier
25
Oct. 18, 2000Machine Organization25 Pipeline Stall for Branch Stall pipeline until MEM stage, which determines new PC Don’t stall until a branch is detected (ID) 3 cycles lost per branch is significant –30% branch frequency + ideal CPI = 1 machine with branch stalls only achieves 1/2 of ideal speedup
26
Oct. 18, 2000Machine Organization26 Computing the Taken PC Earlier Can detect branch condition (BEQZ, BNEZ) during ID Need extra adder to compute branch target during ID This reduces stall to one cycle
27
Oct. 18, 2000Machine Organization27 Compile Time Branch Prediction Assume either that the branch is taken or not taken Proceed under this assumption - if wrong “back out” and start over.
28
Oct. 18, 2000Machine Organization28 Delayed Branch Instruction after branch (branch delay slot) is executed no matter what the outcome of the branch is Requires that the instruction in the branch delay slot is safe to execute independent of branch Effectiveness depends on compiler
29
Oct. 18, 2000Machine Organization29 Designing Instruction Sets (MIPS) for Pipelining Want to break down instruction execution into a reasonable number of stages of roughly equal complexity All instructions the same length –easier to fetch and decode Few instruction formats (source register fields are located in the same place) –can begin reading registers at the same time instruction is decoded Memory operands appear only in loads and stores –calculate address during execute stage and access memory following stage - otherwise expand to addr stage, mem stage and ex stage Operands must be aligned in memory –don’t have to worry about a single data transfer instruction requireing two data memory accesses; hence, it requires a single pipeline stage
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.