Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 19 - Pipelined.

Slides:

Advertisements

Similar presentations

Superscalar and VLIW Architectures Miodrag Bolic CEG3151.

Advertisements

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

Instruction Level Parallelism María Jesús Garzarán University of Illinois at Urbana-Champaign.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative.

Advanced Pipelining Optimally Scheduling Code Optimally Programming Code Scheduling for Superscalars (6.9) Exceptions (5.6, 6.8)

Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original.

CPE432 Chapter 4C.1Dr. W. Abu-Sufah, UJ Chapter 4C: The Processor, Part C Read Section 4.10 Parallelism and Advanced Instruction-Level Parallelism Adapted.

CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.

Instruction Level Parallelism Chapter 4: CS465. Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

Chapter 4 CSF 2009 The processor: Instruction-Level Parallelism.

Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.

Computer Organization

Instruction Level Parallelism (ILP) Colin Stevens.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 20 - Memory.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.

1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:

Review of CS 203A Laxmi Narayan Bhuyan Lecture2.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.

Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Memory Hierarchy 2.

1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve performance by increasing instruction throughput.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.

Fall 2014, Nov ELEC / Lecture 12 1 ELEC / Computer Architecture and Design Fall 2014 Instruction-Level Parallelism.

Multiple Issue Processors: Superscalar and VLIW

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Multi-Cycle Processor.

1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Instruction Level Parallelism Pipeline with data forwarding and accelerated branch Loop Unrolling Multiple Issue -- Multiple functional Units Static vs.

CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 16 - Multi-Cycle.

LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,

Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng

Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Use of Pipelining to Achieve CPI < 1

CS 352H: Computer Systems Architecture

Instruction Level Parallelism

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue

Morgan Kaufmann Publishers

Pipeline Architecture since 1985

Instructor: Justin Hsia

Pipeline Implementation (4.6)

/ Computer Architecture and Design

Pipelining: Advanced ILP

Chapter 4 The Processor Part 6

Morgan Kaufmann Publishers The Processor

Computer Architecture

Control unit extension for data hazards

CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue

Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

Superscalar and VLIW Architectures

CSC3050 – Computer Architecture

Control unit extension for data hazards

Systems Architecture II

Lecture 5: Pipeline Wrap-up, Static ILP

Guest Lecturer: Justin Hsia

Presentation transcript:

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 19 - Pipelined Processor Design 3 Superscalar CPU Fall 2004 Reading: 6.7, , 6.13* Homework Due 12/8: 6.1, 6.2, 6.3, 6.4, 6.7, 6.8, 6.9, 6.15 Assignment: Project 4 Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s Slides - Fall 1999 CMU other sources as noted

ECE 313 Fall 2004Lecture 19 - Pipelining 32 Project 4 – Basic Pipelined MIPS

ECE 313 Fall 2004Lecture 19 - Pipelining 33 Project 4 - What to Do  Download & simulate basic model  Extend processor to do either  Data forwarding + load/use stall (see Fig. 6-36/6.33) OR  Branch implementation in ID including IF.Flush (See Fig. 6-38)  Simulate extended processor to show it works  You may work in groups of two

ECE 313 Fall 2004Lecture 19 - Pipelining 34 Pipelined Processor Design with Hazard Detection (Fig. 6.36, old 6.46)

ECE 313 Fall 2004Lecture 19 - Pipelining 35 Pipelined Processor - Design with Branch Hardware in ID (Old Fig. 6.51)

ECE 313 Fall 2004Lecture 19 - Pipelining 36 Pipelining Outline  Introduction  Pipelined Processor Design  Advanced Pipelining   Overview - Instruction Level Parallelism  Superpipelining  Static Multiple Issue  Dynamic Multiple Issue  Speculation  Simultaneous Multithreading (HyperThreading)

ECE 313 Fall 2004Lecture 19 - Pipelining 37 Instruction Level Parallelism (ILP)  Parallel execution of instructions is known as Instruction Level Parallelism (ILP)  Pipelining exploits ILP by overlapping execution  ILP limited by  Data hazards  Control hazards

ECE 313 Fall 2004Lecture 19 - Pipelining 38 Techniques to Increase ILP  Forwarding  Branch Prediction  Superpipelining   Static Multiple Issue  Dynamic Multiple Issue  Speculation  Simultaneous Multithreading (SMT)

ECE 313 Fall 2004Lecture 19 - Pipelining 39 Superpipelining  Key idea: increase the number of stages  MIPS R Stages  MIPS R Stages  Pentium Stages  Pentium Stages  Tradeoffs +Less logic in each stage -> faster clock -Longer pipeline -> higher penalty for stalls, flushes  Used in conjunction with other techniques (e.g. branch prediction) to overcome disadvantages

ECE 313 Fall 2004Lecture 19 - Pipelining 310 Techniques to Increase ILP  Forwarding  Branch Prediction  Superpipelining  Superscalar with Static Multiple Issue VLIW  Superscalar with Dynamic Multiple Issue  Superscalar with Speculation  Superscalar with Simultaneous Multithreading (SMT)

ECE 313 Fall 2004Lecture 19 - Pipelining 311 Static Multiple Issue  Key idea: issue (decode & execute) multiple instructions in each clock cycle  Example: Issue load/store and ALU/branch in MIPS ALU or branch Instruction typePipe stages IFIDEXMEMWB Load/ StoreIFIDEXMEMWB ALU or branch Load/ Store ALU or branch Load/ Store ALU or branch Load/ Store IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB (Fig. 6.44, old 6.57)

ECE 313 Fall 2004Lecture 19 - Pipelining 312 Example - A Static Multiple Issue MIPS (Fig. 6.45, old 6.58) Executes ALU/Branch Instructions Executes Load/Store Instructions

ECE 313 Fall 2004Lecture 19 - Pipelining 313 Static Multiple Issue Tradeoffs  Advantage: increased performance  Real processors issue up to 6 instructions / cycle  Several challenges:  Building a register file with lots of ports  Dealing with data dependencies  Stalls due to control dependencies (branch prediction helps!)  Building a memory system that can “keep up” (caches help!)  Finding opportunities to fully utilize the functional units

ECE 313 Fall 2004Lecture 19 - Pipelining 314 VLIW / EPIC Processors  VLIW - Very Long Instruction Words  Functional units exposed in instruction word  Static scheduling by compiler  Pipeline is exposed; compiler must schedule delays to get right result  Examples: Philips Trimedia, Texas Instruments C6000  Explicit Parallel Instruction Computer (EPIC)  3 41-bit instructions in each instruction packet  Compiler determines parallelism  Hardware checks dependencies and fowards/stalls  Examples: Intel Itanium, Itanium 2

ECE 313 Fall 2004Lecture 19 - Pipelining 315 Itanium Block Diagram Source: Extreme Tech

ECE 313 Fall 2004Lecture 19 - Pipelining 316 Software Manipulation to Increase ILP  Software Transformations can increase ILP  Code reordering to reduce stalls  Loop unrolling  Example (p. 438) Loop:lw$t0, 0($s1) # $t0=array element addu$t0, $t0, $s2 # add scalar in $s2 sw$t0, 0($s1) # store result addi$s1, $s1, -4 # decrement ptr bne$s1, $zero, Loop  Goal: reorder to speed superscalar execution

ECE 313 Fall 2004Lecture 19 - Pipelining 317 Software Manipulation Reordering Code  Note sparse utilization of superscalar pipeline!  End result:  5 instructions in 4 clocks  CPI = 0.8 ALU or branch instructionData transfer instructionClock Loop:lw $t0, 0($s1)1 addi $s1, $s1, -42 addu $t0, $t0, $s23 bne $s1, $zero, Loopsw $t0, 4($s1)4

ECE 313 Fall 2004Lecture 19 - Pipelining 318 Software Manipulation - Loop Unrolling  Assume loop count a multiple of 4 & unroll  End result:  4 loop iterations in 8 clocks  2 clocks / iteration! ALU or branch instructionData transfer instructionClock Loop:addi $s1, $s1, -16lw $t0, 0($s1)1 lw $t1, 12($s1)2 lw $t2, 8($s1)3 lw $t3, 4($s1)4 sw $t0, 0($s1)5 sw $t1, 12($s1)6 sw $t2, 8($s1)7 bne $s1, $zero, Loopsw $t3, 4($s1)8 addu $t0, $t0, $s2 addu $t1, $t1, $s2 addu $t2, $t2, $s2 addu $t3, $t3, $s2

ECE 313 Fall 2004Lecture 19 - Pipelining 319 Techniques to Increase ILP  Forwarding  Branch Prediction  Superpipelining  Superscalar with Static Multiple Issue VLIW  Superscalar with Dynamic Multiple Issue   Superscalar with Speculation  Superscalar with Simultaneous Multithreading (SMT)

ECE 313 Fall 2004Lecture 19 - Pipelining 320 Dynamic Multiple Issue  Key ideas:  ”Look past" stalls for instructions that can execute lw $t0, 20($t2) addu$t1, $t0, $s2 sub$s4, $s4, $s3 slti$t5, $s4, 20  Execute instructions out of order  Use multiple functional units for parallel execution  Forward results between functional units when necessary  Update registers (in original order of execution) addu stalls until $t0 available sub is ready to execute but blocked by stall!

ECE 313 Fall 2004Lecture 19 - Pipelining 321 Speculation  Guess about the outcome of an instruction (e.g., branch or load)  Based on guess, start executing instructions  Cancel started instructions if guess is incorrect  Complicating factors  Must buffer instruction results until outcome known  Exceptions in speculated instructions - how can you have an exception in an instruction that didn’t execute?

ECE 313 Fall 2004Lecture 19 - Pipelining 322 Superscalar Dynamic Pipelining (Fig. 6.49, old 6.61) Instruction Fetch and decode unit Reservation station Reservation station Reservation station Reservation station Integer Floating point Load/ Store Commit unit Functional units In-order issue In-order commit Out-of-order execute

ECE 313 Fall 2004Lecture 19 - Pipelining 323 Superscalar Dynamic Pipelining in the Pentium 4 (Fig. 6.50, mod old 6.62)

ECE 313 Fall 2004Lecture 19 - Pipelining 324 Superscalar Dynamic Pipelining in the Pentium 4 Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter

ECE 313 Fall 2004Lecture 19 - Pipelining 325 Pentium 3 & 4 Pipeline Stages Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter Drive stages - waiting for signal propagation

ECE 313 Fall 2004Lecture 19 - Pipelining 326 Techniques to Increase ILP  Forwarding  Branch Prediction  Superpipelining  Multiple Issue - Superscalar, VLIW/EPIC  Software manipulation  Dynamic Pipeline Scheduling  Speculative Execution  Simultaneous Multithreading (SMT) 

ECE 313 Fall 2004Lecture 19 - Pipelining 327 Simultaneous Multithreading (SMT)  Key idea: extend a superscalar processor to multiple threads of execution that execute concurrently  Each thread has its own PC and register state  Scheduling hardware shares functional units  Appears to software as two “separate” processors  Advantage: when one thread stalls, another may be ready  Proposed for servers, where multiple threads are common State Thread A State Thread B Functional Units Issue Slots Time

ECE 313 Fall 2004Lecture 19 - Pipelining 328 Roadmap for the term: major topics  Overview / Abstractions and Technology  Performance  Instruction sets  Logic & arithmetic  Processor Implementation  Memory systems   Input/Output