Download presentation
Presentation is loading. Please wait.
Published byPiers Cox Modified over 8 years ago
1
Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng cheng@gwu.edu
2
Announcement Homework assignment #11, Due time – by April 8. Reading: Sections 6.8 Problems: 6.30 – 6.31 Project #3 is due on April 10, 2004 Final: Tuesday, May 4 th, 11:00-1:00PM Note: you must pass final to pass this course!
3
SW is In EX Stage ID/EX.MemWrite and MEM/WB.RegWrite and MEM/WB.RegisterRd = ID/EX.RegisterRt and EX/MEM.RegisterRd != ID/EX. RegisterRt and MEM/WB.RegisterRd != 0 Sign-Ext R-Type or lw sw R-Type ID/EX.MemWrite and EX/MEM.RegWrite and EX/MEM.RegisterRd = ID/EX.RegisterRt and EX/MEM.RegisterRd != 0
4
The Big Picture: Where are We Now? The Five Classic Components of a Computer Current Topics: Superscalar and Dynamic Pipeling Control Datapath Memory Processor Input Output
5
Is Faster Processor Possible? Potentially pipelining can provide CPI=1. Is it possible to design faster processor? Yes Superpipelining – longer pipelines Divide washer into 3 machines: wash, rinse, spin Superscaler – replicate the internal components of the computer so that it can launch multiple instructions per CC. Buy 3 washer, 3 dryer, etc. Dynamic pipelining – use hardware to avoid pipeline hazard Out of order execution is possible More complicated pipeline control and instruction execution model.
6
Issuing Multiple Instructions/Cycle Two main variations: Superscalar and VLIW Superscalar: varying no. instructions/cycle (1 to 6) Parallelism and dependencies determined/resolved by HW IBM PowerPC 604, Sun UltraSparc, DEC Alpha 21164, HP 7100 Very Long Instruction Words (VLIW): fixed number of instructions (16) parallelism determined by compiler Pipeline is exposed; compiler must schedule delays to get right result Explicit Parallel Instruction Computer (EPIC)/ Intel 128 bit packets containing 3 instructions (can execute sequentially) Can link 128 bit packets together to allow more parallelism Compiler determines parallelism, HW checks dependencies and forwards/stalls
7
Superscalar MIPS Assume two instructions are issued per clock cycle ALU operation or branch Memory access instructions ALU or branch instructionIFIDEXMEMWB Load or store instructionIFIDEXMEMWB ALU or branch instructionIFIDEXMEMWB Load or store instructionIFIDEXMEMWB ALU or branch instructionIFIDEXMEMWB Load or store instructionIFIDEXMEMWB ALU or branch instructionIFIDEXMEMWB Load or store instructionIFIDEXMEMWB Instruction TypePipe stages
8
Additional Hardware Requirement Instructions be paired and aligned Extra ports in the register file – 2 instructions Separate adder for lw/sw address computation What will happen for load-use instructions?
9
Simple Superscalar Example How would this loop be scheduled on a superscalar pipeline for MIPS? Loop:lw$t0, 0($s1) addu$t0, $t0, $s2 sw$t0, 0($s1) addi$s1, $s1, -4 bne$s1, $zero, Loop Re-order the instructions to avoid as many pipeline stalls as possible Solution Hints: Figure out instructions with data dependencies – can not be out of order! Figure out load-use instructions requiring pipeline stalls Any performance (in CPI) improvement?
10
Loop Unrolling Purpose: To achieve more performance improvement from looping Idea: Schedule multiple copies of the loop body together The previous example: assume loop index is a multiple of 4 What is the performance improvement?
11
Dynamic Pipeline Scheduling The hardware performs the “scheduling” hardware tries to find instructions to execute out of order execution is possible speculative execution and dynamic branch prediction Basic Idea DPS tries to find later instructions to execute while waiting for a stall to be resolved Pipeline is divided into 3 major units: Instruction fetch and issue unit – IF, ID Execute unit – 5 to 10 independent functional units Commit unit – determine when to put the result back to register or memory In-order completion vs. out-of-order completion
12
Basic Idea
13
Summary All modern processors are very complicated DEC Alpha 21264: 9 stage pipeline, 6 instruction in parallel, 4 instructions per CC. PowerPC and Pentium/Itanium: branch history table, dynamic pipelining Compiler technology is important Dynamic pipelining combines with branch prediction is very challenging Commit unit should know how to “rollback” -- to discard instructions when prediction is wrong Dynamic execution is based on prediction: Hide memory latency Avoid stalls Execute instructions while waiting hazards to be resolved
14
Exercise 6.20 lw$2, 100($5)sw$2, 200($6) Do forwarding in which stage? How about hazard detection?
15
Forwarding Unit in EX Stage Mux 0101 Conditions?
16
Forwarding Unit in MEM Stage Is it possible? -- YES Steps: Change control unit s. t. RegDst is valid to select ID/EX.RegisterRt for sw instruction, even though sw does not require it Add multiplexer to the write port of data memory Conditions for the forwarding unit to generate the selector signal? RegDst RegisterRt Mux
17
Hazard Detection Conditions?
18
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.