Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 19 - Pipelined.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette.edu ECE 313 - Computer Organization Lecture 19 - Pipelined Processor Design 3 Superscalar CPU Fall 2004 Reading: 6.7, 6.9-6.12, 6.13* Homework Due 12/8: 6.1, 6.2, 6.3, 6.4, 6.7, 6.8, 6.9, 6.15 Assignment: Project 4 Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18-347 Slides - Fall 1999 CMU other sources as noted

ECE 313 Fall 2004Lecture 19 - Pipelining 32 Project 4 – Basic Pipelined MIPS

ECE 313 Fall 2004Lecture 19 - Pipelining 33 Project 4 - What to Do  Download & simulate basic model  Extend processor to do either  Data forwarding + load/use stall (see Fig. 6-36/6.33) OR  Branch implementation in ID including IF.Flush (See Fig. 6-38)  Simulate extended processor to show it works  You may work in groups of two

ECE 313 Fall 2004Lecture 19 - Pipelining 34 Pipelined Processor Design with Hazard Detection (Fig. 6.36, old 6.46)

ECE 313 Fall 2004Lecture 19 - Pipelining 35 Pipelined Processor - Design with Branch Hardware in ID (Old Fig. 6.51)

ECE 313 Fall 2004Lecture 19 - Pipelining 36 Pipelining Outline  Introduction  Pipelined Processor Design  Advanced Pipelining   Overview - Instruction Level Parallelism  Superpipelining  Static Multiple Issue  Dynamic Multiple Issue  Speculation  Simultaneous Multithreading (HyperThreading)

ECE 313 Fall 2004Lecture 19 - Pipelining 37 Instruction Level Parallelism (ILP)  Parallel execution of instructions is known as Instruction Level Parallelism (ILP)  Pipelining exploits ILP by overlapping execution  ILP limited by  Data hazards  Control hazards

ECE 313 Fall 2004Lecture 19 - Pipelining 38 Techniques to Increase ILP  Forwarding  Branch Prediction  Superpipelining   Static Multiple Issue  Dynamic Multiple Issue  Speculation  Simultaneous Multithreading (SMT)

ECE 313 Fall 2004Lecture 19 - Pipelining 39 Superpipelining  Key idea: increase the number of stages  MIPS R2000 - 5 Stages  MIPS R4000 - 8 Stages  Pentium 3 - 10 Stages  Pentium 4 - 20 Stages  Tradeoffs +Less logic in each stage -> faster clock -Longer pipeline -> higher penalty for stalls, flushes  Used in conjunction with other techniques (e.g. branch prediction) to overcome disadvantages

ECE 313 Fall 2004Lecture 19 - Pipelining 310 Techniques to Increase ILP  Forwarding  Branch Prediction  Superpipelining  Superscalar with Static Multiple Issue VLIW  Superscalar with Dynamic Multiple Issue  Superscalar with Speculation  Superscalar with Simultaneous Multithreading (SMT)

ECE 313 Fall 2004Lecture 19 - Pipelining 311 Static Multiple Issue  Key idea: issue (decode & execute) multiple instructions in each clock cycle  Example: Issue load/store and ALU/branch in MIPS ALU or branch Instruction typePipe stages IFIDEXMEMWB Load/ StoreIFIDEXMEMWB ALU or branch Load/ Store ALU or branch Load/ Store ALU or branch Load/ Store IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB (Fig. 6.44, old 6.57)

ECE 313 Fall 2004Lecture 19 - Pipelining 312 Example - A Static Multiple Issue MIPS (Fig. 6.45, old 6.58) Executes ALU/Branch Instructions Executes Load/Store Instructions

ECE 313 Fall 2004Lecture 19 - Pipelining 313 Static Multiple Issue Tradeoffs  Advantage: increased performance  Real processors issue up to 6 instructions / cycle  Several challenges:  Building a register file with lots of ports  Dealing with data dependencies  Stalls due to control dependencies (branch prediction helps!)  Building a memory system that can “keep up” (caches help!)  Finding opportunities to fully utilize the functional units

ECE 313 Fall 2004Lecture 19 - Pipelining 314 VLIW / EPIC Processors  VLIW - Very Long Instruction Words  Functional units exposed in instruction word  Static scheduling by compiler  Pipeline is exposed; compiler must schedule delays to get right result  Examples: Philips Trimedia, Texas Instruments C6000  Explicit Parallel Instruction Computer (EPIC)  3 41-bit instructions in each instruction packet  Compiler determines parallelism  Hardware checks dependencies and fowards/stalls  Examples: Intel Itanium, Itanium 2

ECE 313 Fall 2004Lecture 19 - Pipelining 315 Itanium Block Diagram Source: Extreme Tech www.extremetech.com

ECE 313 Fall 2004Lecture 19 - Pipelining 316 Software Manipulation to Increase ILP  Software Transformations can increase ILP  Code reordering to reduce stalls  Loop unrolling  Example (p. 438) Loop:lw$t0, 0($s1) # $t0=array element addu$t0, $t0, $s2 # add scalar in $s2 sw$t0, 0($s1) # store result addi$s1, $s1, -4 # decrement ptr bne$s1, $zero, Loop  Goal: reorder to speed superscalar execution

ECE 313 Fall 2004Lecture 19 - Pipelining 317 Software Manipulation Reordering Code  Note sparse utilization of superscalar pipeline!  End result:  5 instructions in 4 clocks  CPI = 0.8 ALU or branch instructionData transfer instructionClock Loop:lw $t0, 0($s1)1 addi $s1, $s1, -42 addu $t0, $t0, $s23 bne $s1, $zero, Loopsw $t0, 4($s1)4

ECE 313 Fall 2004Lecture 19 - Pipelining 318 Software Manipulation - Loop Unrolling  Assume loop count a multiple of 4 & unroll  End result:  4 loop iterations in 8 clocks  2 clocks / iteration! ALU or branch instructionData transfer instructionClock Loop:addi $s1, $s1, -16lw $t0, 0($s1)1 lw $t1, 12($s1)2 lw $t2, 8($s1)3 lw $t3, 4($s1)4 sw $t0, 0($s1)5 sw $t1, 12($s1)6 sw $t2, 8($s1)7 bne $s1, $zero, Loopsw $t3, 4($s1)8 addu $t0, $t0, $s2 addu $t1, $t1, $s2 addu $t2, $t2, $s2 addu $t3, $t3, $s2

ECE 313 Fall 2004Lecture 19 - Pipelining 319 Techniques to Increase ILP  Forwarding  Branch Prediction  Superpipelining  Superscalar with Static Multiple Issue VLIW  Superscalar with Dynamic Multiple Issue   Superscalar with Speculation  Superscalar with Simultaneous Multithreading (SMT)

ECE 313 Fall 2004Lecture 19 - Pipelining 320 Dynamic Multiple Issue  Key ideas:  ”Look past" stalls for instructions that can execute lw $t0, 20($t2) addu$t1, $t0, $s2 sub$s4, $s4, $s3 slti$t5, $s4, 20  Execute instructions out of order  Use multiple functional units for parallel execution  Forward results between functional units when necessary  Update registers (in original order of execution) addu stalls until $t0 available sub is ready to execute but blocked by stall!

ECE 313 Fall 2004Lecture 19 - Pipelining 321 Speculation  Guess about the outcome of an instruction (e.g., branch or load)  Based on guess, start executing instructions  Cancel started instructions if guess is incorrect  Complicating factors  Must buffer instruction results until outcome known  Exceptions in speculated instructions - how can you have an exception in an instruction that didn’t execute?

ECE 313 Fall 2004Lecture 19 - Pipelining 322 Superscalar Dynamic Pipelining (Fig. 6.49, old 6.61) Instruction Fetch and decode unit Reservation station Reservation station Reservation station Reservation station Integer Floating point Load/ Store Commit unit Functional units In-order issue In-order commit Out-of-order execute

ECE 313 Fall 2004Lecture 19 - Pipelining 323 Superscalar Dynamic Pipelining in the Pentium 4 (Fig. 6.50, mod old 6.62)

ECE 313 Fall 2004Lecture 19 - Pipelining 324 Superscalar Dynamic Pipelining in the Pentium 4 Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter 2001 http://developer.intel.com/technology/itj/q12001/articles/art_2.htm.

ECE 313 Fall 2004Lecture 19 - Pipelining 325 Pentium 3 & 4 Pipeline Stages Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter 2001 http://developer.intel.com/technology/itj/q12001/articles/art_2.htm. Drive stages - waiting for signal propagation

ECE 313 Fall 2004Lecture 19 - Pipelining 326 Techniques to Increase ILP  Forwarding  Branch Prediction  Superpipelining  Multiple Issue - Superscalar, VLIW/EPIC  Software manipulation  Dynamic Pipeline Scheduling  Speculative Execution  Simultaneous Multithreading (SMT) 

ECE 313 Fall 2004Lecture 19 - Pipelining 327 Simultaneous Multithreading (SMT)  Key idea: extend a superscalar processor to multiple threads of execution that execute concurrently  Each thread has its own PC and register state  Scheduling hardware shares functional units  Appears to software as two “separate” processors  Advantage: when one thread stalls, another may be ready  Proposed for servers, where multiple threads are common State Thread A State Thread B Functional Units Issue Slots Time

ECE 313 Fall 2004Lecture 19 - Pipelining 328 Roadmap for the term: major topics  Overview / Abstractions and Technology  Performance  Instruction sets  Logic & arithmetic  Processor Implementation  Memory systems   Input/Output

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 19 - Pipelined.

Similar presentations

Presentation on theme: "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 19 - Pipelined."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 19 - Pipelined.

Similar presentations

Presentation on theme: "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 19 - Pipelined."— Presentation transcript:

Similar presentations

About project

Feedback