Download presentation
Presentation is loading. Please wait.
1
Spring 2005331 W11.1 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]
2
Spring 2005331 W11.2 Head’s Up Reminders l Pipelined datapath and control l HW#6 will be handed out soon
3
Spring 2005331 W11.3 Review: Multicycle Data and Control Path Address Read Data (Instr. or Data) Memory PC Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU Write Data IR MDR A B ALUout Sign Extend Shift left 2 ALU control Shift left 2 ALUOp Control FSM IRWrite MemtoReg MemWrite MemRead IorD PCWrite PCWriteCond RegDst RegWrite ALUSrcA ALUSrcB zero PCSource 1 1 1 1 1 1 0 0 0 0 0 0 2 2 3 4 Instr[5-0] Instr[25-0] PC[31-28] Instr[15-0] Instr[31-26] 32 28
4
Spring 2005331 W11.4 Review: RTL Summary StepR-typeMem RefBranchJump Instr fetch IR = Memory[PC]; PC = PC + 4; Decode A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC +(sign-extend(IR[15-0])<< 2); Execute ALUOut = A op B; ALUOut = A + sign-extend (IR[15-0]); if (A==B) PC = ALUOut; PC = PC[31-28] ||(IR[25-0] << 2); Memory access Reg[IR[15- 11]] = ALUOut; MDR = Memory[ALUOut]; or Memory[ALUOut] = B; Write- back Reg[IR[20-16]] = MDR;
5
Spring 2005331 W11.5 Review: Multicycle Datapath FSM Start Instr Fetch Decode Write Back Memory Access Execute (Op = R-type) (Op = beq) (Op = lw or sw) (Op = j) (Op = lw) (Op = sw) 01 2 3 4 5 6 7 8 9 Unless otherwise assigned PCWrite,IRWrite, MemWrite,RegWrite=0 others=X IorD=0 MemRead;IRWrite ALUSrcA=0 ALUsrcB=01 PCSource,ALUOp=00 PCWrite ALUSrcA=0 ALUSrcB=11 ALUOp=00 PCWriteCond=0 ALUSrcA=1 ALUSrcB=10 ALUOp=00 PCWriteCond=0 ALUSrcA=1 ALUSrcB=00 ALUOp=10 PCWriteCond=0 ALUSrcA=1 ALUSrcB=00 ALUOp=01 PCSource=01 PCWriteCond PCSource=10 PCWrite MemRead IorD=1 PCWriteCond=0 MemWrite IorD=1 PCWriteCond=0 RegDst=1 RegWrite MemtoReg=0 PCWriteCond=0 RegDst=0 RegWrite MemtoReg=1 PCWriteCond=0
6
Spring 2005331 W11.6 Review: FSM Implementation Combinational control logic State Reg Inst[31-26] Next State Inputs Outputs Op0Op1Op2Op3Op4Op5 PCWrite PCWriteCond IorD MemRead MemWrite IRWrite MemtoReg PCSource ALUOp ALUSourceB ALUSourceA RegWrite RegDst System Clock
7
Spring 2005331 W11.7 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction Is wasteful of area since some functional units must (e.g., adders) be duplicated since they can not be shared during a clock cycle but Is simple and easy to understand Clk Single Cycle Implementation: lwswWaste Cycle 1Cycle 2
8
Spring 2005331 W11.8 Multicycle Advantages & Disadvantages Uses the clock cycle efficiently – the clock cycle is timed to accommodate the slowest instruction step l balance the amount of work to be done in each step l restrict each step to use only one major functional unit Multicycle implementations allow l functional units to be used more than once per instruction as long as they are used on different clock cycles l faster clock rates l different instructions to take a different number of clock cycles but Requires additional internal state registers, muxes, and more complicated (FSM) control
9
Spring 2005331 W11.9 The Five Stages of Load Instruction IFetch: Instruction Fetch and Update PC Dec: Registers Fetch and Instruction Decode Exec: Execute R-type; calculate memory address Mem: Read/write the data from/to the Data Memory WB: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IFetchDecExecMemWB lw
10
Spring 2005331 W11.10 Single Cycle vs. Multiple Cycle Timing Clk Cycle 1 Multiple Cycle Implementation: IFetchDecExecMemWB Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 IFetchDecExecMem lwsw Clk Single Cycle Implementation: lwswWaste IFetch R-type Cycle 1Cycle 2 multicycle clock slower than 1/5 th of single cycle clock due to stage flipflop overhead
11
Spring 2005331 W11.11 Pipelined MIPS Processor Start the next instruction while still working on the current one l improves throughput - total amount of work done in a given time l instruction latency (execution time, delay time, response time) is not reduced - time from the start of an instruction to its completion Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IFetchDecExecMemWB lw Cycle 7Cycle 6Cycle 8 sw IFetchDecExecMemWB R-type IFetchDecExecMemWB
12
Spring 2005331 W11.12 Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Multiple Cycle Implementation: IFetchDecExecMemWB Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 lw IFetchDecExecMemWB IFetchDecExecMem lwsw Pipeline Implementation: IFetchDecExecMemWB sw Clk Single Cycle Implementation: LoadStoreWaste IFetch R-type IFetchDecExecMemWB R-type Cycle 1Cycle 2 wasted cycle
13
Spring 2005331 W11.13 Pipelining the MIPS ISA What makes it easy l all instructions are the same length (32 bits) l few instruction formats (three) with symmetry across formats l memory operations can occur only in loads and stores l operands must be aligned in memory so a single data transfer requires only one memory access What makes it hard l structural hazards: what if we had only one memory l control hazards: what about branches l data hazards: what if an instruction’s input operands depend on the output of a previous instruction
14
Spring 2005331 W11.14 MIPS Pipeline Datapath Modifications Read Address Instruction Memory Add PC 4 0 1 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU 1 0 Shift left 2 Add Data Memory Address Write Data Read Data 1 0 What do we need to add/modify in our MIPS datapath? l State registers between pipeline stages to isolate them IFetch/Dec Dec/Exec Exec/Mem Mem/WB IFetch DecExecMemWB System Clock Sign Extend
15
Spring 2005331 W11.15 MIPS Pipeline Control Path Modifications Read Address Instruction Memory Add PC 4 0 1 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU 1 0 Shift left 2 Add Data Memory Address Write Data Read Data 1 0 All control signals are determined during Decode l and held in the state registers between pipeline stages IFetch/Dec Dec/Exec Exec/Mem Mem/WB IFetchDecExecMemWB System Clock Control Sign Extend
16
Spring 2005331 W11.16 Graphically Representing MIPS Pipeline Can help with answering questions like: l how many cycles does it take to execute this code? l what is the ALU doing during cycle 4? l is there a hazard, why does it occur, and how can it be fixed? ALU IM Reg DMReg
17
Spring 2005331 W11.17 Why Pipeline? For Throughput! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg Once the pipeline is full, one instruction is completed every cycle Time to fill the pipeline
18
Spring 2005331 W11.18 Can pipelining get us into trouble? Yes: Pipeline Hazards l structural hazards: attempt to use the same resource by two different instructions at the same time l data hazards: attempt to use item before it is ready -instruction depends on result of prior instruction still in the pipeline l control hazards: attempt to make a decision before condition is evaulated -branch instructions Can always resolve hazards by waiting l pipeline control must detect the hazard l take action (or delay action) to resolve hazards
19
Spring 2005331 W11.19 I n s t r. O r d e r Time (clock cycles) lw Inst 1 Inst 2 Inst 4 Inst 3 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg A Unified Memory Would Be a Structural Hazard Reading data from memory Reading instruction from memory
20
Spring 2005331 W11.20 How About Register File Access? I n s t r. O r d e r Time (clock cycles) add Inst 1 Inst 2 Inst 4 add ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg Can fix register file access hazard by doing reads in the second half of the cycle and writes in the first half.
21
Spring 2005331 W11.21 Branch Instructions Cause Control Hazards I n s t r. O r d e r add beq lw Inst 4 Inst 3 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg Dependencies backward in time cause hazards
22
Spring 2005331 W11.22 One Way to “Fix” a Control Hazard I n s t r. O r d e r add beq ALU IM Reg DMReg ALU IM Reg DMReg Inst 3 lw ALU IM Reg DMReg ALU IM Reg DMReg Can fix branch hazard by waiting – stall – but affects throughput stall
23
Spring 2005331 W11.23 Register Usage Can Cause Data Hazards I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r5 and r6,r1,r7 xor r4,r1,r5 or r8, r1, r9 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg Dependencies backward in time cause hazards
24
Spring 2005331 W11.24 One Way to “Fix” a Data Hazard I n s t r. O r d e r add r1,r2,r3 ALU IM Reg DMReg sub r4,r1,r5 and r6,r1,r7 ALU IM Reg DMReg ALU IM Reg DMReg stall Can fix data hazard by waiting – stall – but affects throughput
25
Spring 2005331 W11.25 Loads Can Cause Data Hazards I n s t r. O r d e r lw r1,100(r2) sub r4,r1,r5 and r6,r1,r7 xor r4,r1,r5 or r8, r1, r9 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg Dependencies backward in time cause hazards
26
Spring 2005331 W11.26 Stores Can Cause Data Hazards I n s t r. O r d e r add r1,r2,r3 sw r1,100(r5) and r6,r1,r7 xor r4,r1,r5 or r8, r1, r9 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg Dependencies backward in time cause hazards
27
Spring 2005331 W11.27 Other Pipeline Structures Are Possible What about (slow) multiply operation? l let it take two cycles What if the data memory access is twice as slow as the instruction memory? l make the clock twice as slow or … l let data memory access take two cycles (and keep the same clock rate) ALU IM Reg DMReg MUL ALU IM Reg DM1Reg DM2
28
Spring 2005331 W11.28 Sample Pipeline Alternatives ARM7 StrongARM-1 XScale ALU IM1 IM2 DM1 Reg DM2 IM Reg EX PC update IM access decode reg access ALU op DM access shift/rotate commit result (write back) ALU IM Reg DMReg SHFT PC update BTB access start IM access IM access decode reg 1 access shift/rotate reg 2 access ALU op start DM access exception DM write reg write
29
Spring 2005331 W11.29 Summary All modern day processors use pipelining Pipelining doesn’t help latency of single task, it helps throughput of entire workload l Multiple tasks operating simultaneously using different resources Potential speedup = Number of pipe stages Pipeline rate limited by slowest pipeline stage l Unbalanced lengths of pipe stages reduces speedup l Time to “fill” pipeline and time to “drain” it reduces speedup Must detect and resolve hazards l Stalling negatively affects throughput
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.