Part IV Data Path and Control

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
S. Barua – CPSC 440 CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining.
CIS429.S00: Lec3 - 1 CPU Time Analysis Terminology IC = instruction count = number of instructions in the program CPI = cycles per instruction (varies.
1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.
Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
The Optimum Pipeline Depth for a Microprocessor Fang Pang Oct/01/02.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Computer Architecture Lecture 2 Instruction Set Principles.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
DLX Instruction Format
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
Appendix A Pipelining: Basic and Intermediate Concepts
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.
10-1 Chapter 10 - Trends in Computer Architecture Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
Chapter One Introduction to Pipelined Processors.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
July 2005Computer Architecture, Data Path and ControlSlide 1 Part IV Data Path and Control.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
10-1 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of.
CPU Design and Pipelining – Page 1CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: CPU Operations and Pipelining Reading:
Principles of Linear Pipelining
Chapter One Introduction to Pipelined Processors.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
Chapter Six.
Computer Architecture
Morgan Kaufmann Publishers
Lecture: Pipelining Basics
Part IV Data Path and Control
Chapter 4 The Processor Part 2
Part IV Data Path and Control
Lecture 5: Pipelining Basics
Serial versus Pipelined Execution
Chapter Six.
Chapter Six.
Pipelining Chapter 6.
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Part IV Data Path and Control
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipeline Performance.
Pipelining: Basic Concepts
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
ARM ORGANISATION.
Pipelining Chapter 6.
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Morgan Kaufmann Publishers The Processor
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Systems Architecture II
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Pipelining.
Presentation transcript:

Part IV Data Path and Control July 2005 Computer Architecture, Data Path and Control

IV Data Path and Control Design a simple computer (MicroMIPS) to learn about: Data path – part of the CPU where data signals flow Control unit – guides data signals through data path Pipelining – a way of achieving greater performance Topics in This Part Chapter 13 Instruction Execution Steps Chapter 14 Control Unit Synthesis Chapter 15 Pipelined Data Paths Chapter 16 Pipeline Performance Limits July 2005 Computer Architecture, Data Path and Control

Computer Architecture, Data Path and Control 15 Pipelined Data Paths Pipelining is now used in even the simplest of processors Same principles as assembly lines in manufacturing Unlike in assembly lines, instructions not independent Topics in This Chapter 15.1 Pipelining Concepts 15.2 Pipeline Stalls or Bubbles 15.3 Pipeline Timing and Performance 15.4 Pipelined Data Path Design 15.5 Pipelined Control 15.6 Optimal Pipelining July 2005 Computer Architecture, Data Path and Control

Single-Cycle Data Path of Chapter 13 Clock rate = 125 MHz CPI = 1 (125 MIPS) Fig. 13.3 Key elements of the single-cycle MicroMIPS data path. July 2005 Computer Architecture, Data Path and Control

Multicycle Data Path of Chapter 14 Clock rate = 500 MHz CPI  4 ( 125 MIPS) Fig. 14.3 Key elements of the multicycle MicroMIPS data path. July 2005 Computer Architecture, Data Path and Control

Getting the Best of Both Worlds Pipelined: Clock rate = 500 MHz CPI  1 Single-cycle: Clock rate = 125 MHz CPI = 1 Multicycle: Clock rate = 500 MHz CPI  4 July 2005 Computer Architecture, Data Path and Control

Computer Architecture, Data Path and Control 15.1 Pipelining Concepts Strategies for improving performance 1 – Use multiple independent data paths accepting several instructions that are read out at once: multiple-instruction-issue or superscalar   2 – Overlap execution of several instructions, starting the next instruction before the previous one has run to completion: (super)pipelined Fig. 15.1 Pipelining in the student registration process. July 2005 Computer Architecture, Data Path and Control

Pipelined Instruction Execution Fig. 15.2 Pipelining in the MicroMIPS instruction execution process. July 2005 Computer Architecture, Data Path and Control

Alternate Representations of a Pipeline Except for start-up and drainage overheads, a pipeline can execute one instruction per clock tick; IPS is dictated by the clock frequency Fig. 15.3 Two abstract graphical representations of a 5-stage pipeline executing 7 tasks (instructions). July 2005 Computer Architecture, Data Path and Control

Pipelining Example in a Photocopier A photocopier with an x-sheet document feeder copies the first sheet in 4 s and each subsequent sheet in 1 s. The copier’s paper path is a 4-stage pipeline with each stage having a latency of 1s. The first sheet goes through all 4 pipeline stages and emerges after 4 s. Each subsequent sheet emerges 1s after the previous sheet. How does the throughput of this photocopier varies with x, assuming that loading the document feeder and removing the copies takes 15 s. Solution Each batch of x sheets is copied in 15 + 4 + (x – 1) = 18 + x seconds. A nonpipelined copier would require 4x seconds to copy x sheets. For x > 6, the pipelined version has a performance edge. When x = 50, the pipelining speedup is (4  50) / (18 + 50) = 2.94.   July 2005 Computer Architecture, Data Path and Control

15.2 Pipeline Stalls or Bubbles First type of data dependency Fig. 15.4 Read-after-write data dependency and its possible resolution through data forwarding . July 2005 Computer Architecture, Data Path and Control

Second Type of Data Dependency Fig. 15.5 Read-after-load data dependency and its possible resolution through bubble insertion and data forwarding. July 2005 Computer Architecture, Data Path and Control

Control Dependency in a Pipeline Fig. 15.6 Control dependency due to conditional branch. July 2005 Computer Architecture, Data Path and Control

15.3 Pipeline Timing and Performance Fig. 15.7 Pipelined form of a function unit with latching overhead. July 2005 Computer Architecture, Data Path and Control

Throughput Increase in a q-Stage Pipeline t / q +  or q 1 + q / t Fig. 15.8 Throughput improvement due to pipelining as a function of the number of pipeline stages for different pipelining overheads.   July 2005 Computer Architecture, Data Path and Control

Pipeline Throughput with Dependencies Assume that one bubble must be inserted due to read-after-load dependency and after a branch when its delay slot cannot be filled. Let  be the fraction of all instructions that are followed by a bubble. q (1 + q / t)(1 + b) Pipeline speedup = R-type 44% Load 24% Store 12% Branch 18% Jump 2% Example 15.3 Calculate the effective CPI for MicroMIPS, assuming that a quarter of branch and load instructions are followed by bubbles. Solution Fraction of bubbles b = 0.25(0.24 + 0.18) = 0.105 CPI = 1 + b = 1.105 (which is very close to the ideal value of 1)   July 2005 Computer Architecture, Data Path and Control

15.4 Pipelined Data Path Design Fig. 15.9 Key elements of the pipelined MicroMIPS data path. July 2005 Computer Architecture, Data Path and Control

Computer Architecture, Data Path and Control 15.5 Pipelined Control Fig. 15.10 Pipelined control signals. July 2005 Computer Architecture, Data Path and Control

Computer Architecture, Data Path and Control 15.6 Optimal Pipelining MicroMIPS pipeline with more than four-fold improvement Fig. 15.11 Higher-throughput pipelined data path for MicroMIPS and the execution of consecutive instructions in it . July 2005 Computer Architecture, Data Path and Control

Optimal Number of Pipeline Stages Assumptions: Pipeline sliced into q stages Stage overhead is t q/2 bubbles per branch (decision made midway) Fraction b of all instructions are taken branches Fig. 15.7 Pipelined form of a function unit with latching overhead. Derivation of q opt Average CPI = 1 + b q / 2 Throughput = Clock rate / CPI = [(t / q + t)(1 + b q / 2)] –1/2 Differentiate throughput expression with respect to q and equate with 0 q opt = [2(t / t) / b)] 1/2   July 2005 Computer Architecture, Data Path and Control