Download presentation
Presentation is loading. Please wait.
1
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette.edu Computer Organization Pipelined Processor Design 3 Feb 2005 Reading: 6.7, 6.9-6.12, 6.13* Homework Due 6/8: 6.1, 6.2, 6.3, 6.4, 6.7, 6.8, 6.9, 6.15 Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18-347 Slides - Fall 1999 CMU other sources as noted
2
Feb 2005Pipelining 32 Project 4 - Pipelined MIPS
3
Feb 2005Pipelining 33 Project 4 - What to Do Download & simulate basic model Extend processor to do either Data forwarding + load/use stall (see Fig. 6-36/6.33) OR Branch implementation in ID including IF.Flush (See Fig. 6-38) Simulate extended processor to show it works You may work in groups of two
4
Feb 2005Pipelining 34 Review - Pipelined Processor with Hazard Detection (Fig. 6.36)
5
Feb 2005Pipelining 35 Pipelined Processor - Branch Hardware in ID (Old Fig. 6.51)
6
Feb 2005Pipelining 36 Pipelining Outline Introduction Pipelined Processor Design Advanced Pipelining Overview - Instruction Level Parallelism Superpipelining Static Multiple Issue Dynamic Multiple Issue Speculation Simultaneous Multithreading (HyperThreading)
7
Feb 2005Pipelining 37 Instruction Level Parallelism (ILP) Parallel execution of instructions is known as Instruction Level Parallelism (ILP) Pipelining exploits ILP by overlapping execution ILP limited by Data hazards Control hazards
8
Feb 2005Pipelining 38 Techniques to Increase ILP Forwarding Branch Prediction Superpipelining Static Multiple Issue Dynamic Multiple Issue Speculation Simultaneous Multithreading (SMT)
9
Feb 2005Pipelining 39 Superpipelining Key idea: increase the number of stages MIPS R2000 - 5 Stages MIPS R4000 - 8 Stages Pentium 3 - 10 Stages Pentium 4 - 20 Stages Tradeoffs +Less logic in each stage -> faster clock -Longer pipeline -> higher penalty for stalls, flushes Used in conjunction with other techniques (e.g. branch prediction) to overcome disadvantages
10
Feb 2005Pipelining 310 Techniques to Increase ILP Forwarding Branch Prediction Superpipelining Static Multiple Issue Dynamic Multiple Issue Speculation Simultaneous Multithreading (SMT)
11
Feb 2005Pipelining 311 Static Multiple Issue Key idea: issue (decode & execute) multiple instructions in each clock cycle Example: Issue load/store and ALU/branch in MIPS ALU or branch Instruction typePipe stages IFIDEXMEMWB Load/ StoreIFIDEXMEMWB ALU or branch Load/ Store ALU or branch Load/ Store ALU or branch Load/ Store IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB (Fig. 6.44)
12
Feb 2005Pipelining 312 Example - A Static Multiple Issue MIPS (Fig. 6.45) Executes ALU/Branch Instructions Executes Load/Store Instructions
13
Feb 2005Pipelining 313 Static Multiple Issue Tradeoffs Advantage: increased performance Real processors issue up to 6 instructions / cycle Several challenges: Building a register file with lots of ports Dealing with data dependencies Stalls due to control dependencies (branch prediction helps!) Building a memory system that can “keep up” (caches help!) Finding opportunities to fully utilize the functional units
14
Feb 2005Pipelining 314 VLIW / EPIC Processors VLIW - Very Long Instruction Words Functional units exposed in instruction word Static scheduling by compiler Pipeline is exposed; compiler must schedule delays to get right result Examples: Philips Trimedia, Transmeta Crusoe Explicit Parallel Instruction Computer (EPIC) 3 41-bit instructions in each instruction packet Compiler determines parallelism Hardware checks dependencies and fowards/stalls Examples: Intel Itanium, Itanium 2
15
Feb 2005Pipelining 315 Itanium Block Diagram Source: Extreme Tech www.extremetech.com
16
Feb 2005Pipelining 316 Software Manipulation to Increase ILP Software Transformations can increase ILP Code reordering to reduce stalls Loop unrolling Example (p. 438) Loop:lw$t0, 0($s1) # $t0=array element addu$t0, $t0, $t2 # add scalar in $s2 sw$t0, 0($s1) # store result addi$s1, $s1, -4 # decrement ptr bne$s1, $zero, Loop Goal: reorder to speed superscalar execution
17
Feb 2005Pipelining 317 Software Manipulation Reordering Code Note sparse utilization of superscalar pipeline! End result: 5 instructions in 4 clocks CPI = 0.8 ALU or branch instructionData transfer instructionClock Loop:lw $t0, 0($s1)1 addi $s1, $s1, -42 addu $t0, $t0, $s23 bne $s1, $zero, Loopsw $t0, 4($s1)4
18
Feb 2005Pipelining 318 Software Manipulation - Loop Unrolling Assume loop count a multiple of 4 & unroll End result: 4 loop iterations in 8 clocks 2 clocks / iteration! ALU or branch instructionData transfer instructionClock Loop:addi $s1, $s1, -16lw $t0, 0($s1)1 lw $t1, 12($s1)2 lw $t2, 8($s1)3 lw $t3, 4($s1)4 sw $t0, 0($s1)5 sw $t1, 12($s1)6 sw $t2, 8($s1)7 bne $s1, $zero, Loopsw $t3, 4($s1)8 addu $t0, $t0, $s2 addu $t1, $t1, $s2 addu $t2, $t2, $s2 addu $t3, $t3, $s2
19
Feb 2005Pipelining 319 Techniques to Increase ILP Forwarding Branch Prediction Superpipelining Static Multiple Issue Dynamic Multiple Issue Speculation Simultaneous Multithreading (SMT)
20
Feb 2005Pipelining 320 Dynamic Multiple Issue Key ideas: ”Look past" stalls for instructions that can execute lw $t0, 20($t2) addu$t1, $t0, $t2 sub$s4, $s4, $s3 slti$t5, $s4, 20 Execute instructions out of order Use multiple functional units for parallel execution Forward results between functional units when necessary Update registers (in original order of execution) addu stalls until $t0 available sub is ready to execute but blocked by stall!
21
Feb 2005Pipelining 321 Speculation Guess about the outcome of an instruction (e.g., branch or load) Based on guess, start executing instructions Cancel started instructions if guess is incorrect Complicating factors Must buffer instruction results until outcome known Exceptions in speculated instructions - how can you have an exception in an instruction that didn’t execute?
22
Feb 2005Pipelining 322 Dynamic Pipelining (Fig. 6.49) Instruction Fetch and decode unit Reservation station Reservation station Reservation station Reservation station Integer Floating point Load/ Store Commit unit Functional units In-order issue In-order commit Out-of-order execute
23
Feb 2005Pipelining 323 Dynamic Pipelining in the Pentium 4 (Fig. 6.50)
24
Feb 2005Pipelining 324 Dynamic Pipelining in the Pentium 4 Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter 2001 http://developer.intel.com/technology/itj/q12001/articles/art_2.htm.
25
Feb 2005Pipelining 325 Pentium 3 & 4 Pipeline Stages Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter 2001 http://developer.intel.com/technology/itj/q12001/articles/art_2.htm. Drive stages - waiting for signal propagation
26
Feb 2005Pipelining 326 Techniques to Increase ILP Forwarding Branch Prediction Superpipelining Multiple Issue - Superscalar, VLIW/EPIC Software manipulation Dynamic Pipeline Scheduling Speculative Execution Simultaneous Multithreading (SMT)
27
Feb 2005Pipelining 327 Simultaneous Multithreading (SMT) Key idea: extend processor to multiple threads of execution that execute concurrently Each thread has its own PC and register state Scheduling hardware shares functional units Appears to software as two “separate” processors Advantage: when one thread stalls, another may be ready Proposed for servers, where multiple threads are common State Thread A State Thread B Functional Units Issue Slots Time
28
Feb 2005Pipelining 328 Roadmap for the term: major topics Overview / Abstractions and Technology Performance Instruction sets Logic & arithmetic Processor Implementation Memory systems Input/Output
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.