Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Slides:

Advertisements

Similar presentations

Morgan Kaufmann Publishers The Processor

Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

The Pipelined CPU Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Revised 9/22/2013.

CMPT 334 Computer Organization

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

Pipelined Datapath and Control (Lecture #13) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

Chapter 12 Pipelining Strategies Performance Hazards.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.

Chapter Six Enhancing Performance with Pipelining

Pipelining Andreas Klappenecker CPSC321 Computer Architecture.

CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?

King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

9.2 Pipelining Suppose we want to perform the combined multiply and add operations with a stream of numbers: A i * B i + C i for i =1,2,3,…,7.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,

Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.

Pipeline Computer Organization II 1 Pipelining Analogy Pipelined laundry: overlapping execution – Parallelism improves performance Four loads: – Speedup.

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.

Computer Organization CS224 Fall 2012 Lesson 28. Pipelining Analogy  Pipelined laundry: overlapping execution l Parallelism improves performance §4.5.

Pipelining Enhancing Performance. Datapath as Designed in Ch. 5 Consider execution of: lw $t1,100($t0) lw $t2,200($t0) lw $t3,300($t0) Datapath segments.

Morgan Kaufmann Publishers

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

Analogy: Gotta Do Laundry

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

Pipelining Example Laundry Example: Three Stages

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor.

10/11: Lecture Topics Execution cycle Introduction to pipelining

Introduction to Computer Organization Pipelining.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

CS203 – Advanced Computer Architecture Pipelining Review.

Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.

Pipelining Chapter 6.

William Stallings Computer Organization and Architecture 8th Edition

CSCI206 - Computer Organization & Programming

Morgan Kaufmann Publishers

Performance of Single-cycle Design

Single Clock Datapath With Control

Pipeline Implementation (4.6)

CDA 3101 Spring 2016 Introduction to Computer Organization

Morgan Kaufmann Publishers The Processor

Pipelining Chapter 6.

The processor: Pipelining and Branching

Pipelining in more detail

CSCI206 - Computer Organization & Programming

CSCI206 - Computer Organization & Programming

Morgan Kaufmann Publishers The Processor

CS203 – Advanced Computer Architecture

Pipelining: Basic Concepts

CSC3050 – Computer Architecture

Pipelining Chapter 6.

Morgan Kaufmann Publishers The Processor

Introduction to Computer Organization and Architecture

Pipelining Chapter 6.

Guest Lecturer: Justin Hsia

Problem ??: (?? marks) Consider executing the following code on the MIPS pipelined datapath: add $t5, $t6, $t8 add $t9, $t5, $t4 lw $t3, 100($t9) sub $t2,

Presentation transcript:

Chapter 4 The Processor

Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined version Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j §4.1 Introduction

Login using Username : your username Password : your password. Uoh.blackboard.com

Go to “Courses” menu

Select “201401_COE308_001_3646: Computer Architecture ”201401_COE308_001_3646:

Select “Content “

Slides

First Task

Chapter 4 — The Processor — 10 Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance §4.5 An Overview of Pipelining Four loads: Speedup = 8/3.5 = 2.3

Chapter 4 — The Processor — 11 MIPS Pipeline Five stages, one step per stage 1.IF: Instruction fetch from memory 2.ID: Instruction decode & register read 3.EX: Execute operation or calculate address 4.MEM: Access memory operand 5.WB: Write result back to register

Chapter 4 — The Processor — 12 Pipeline Performance Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapath InstrInstr fetchRegister read ALU opMemory access Register write Total time lw200ps100 ps200ps 100 ps800ps sw200ps100 ps200ps 700ps R-format200ps100 ps200ps100 ps600ps beq200ps100 ps200ps500ps

Chapter 4 — The Processor — 13 Pipeline Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps)

BasicIdea Basic Idea  Assembly Line  Divide the execution of a task among a number of stages  A task is divided into subtasks to be executed in sequence  Performance improvement compared to sequential execution

Pipeline Task 1 2 n Sub-tasks 1 2 n Pipeline Stream of Tasks

5 Tasks on 4 stage pipeline Task 1 Task 2 Task 3 Task 4 Task Time

Speedup t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Speedup = n * m/n + m -1

Efficiency t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Efficiency = Speedup/ n =m/(n+m-1)

Throughput t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Throughput = no. of tasks executed per unit of time = m/((n+m-1) x t)

Instruction Pipeline  Pipeline stall  Some of the stages might need more time to perform its function.  E.g. I 2 needs 3 time units to perform its function  This is called a “Bubble” or “pipeline hazard”

Pipeline and Instruction Dependency Instruction Dependency The operation performed by a stage depends on the operation(s) performed by other stage(s). E.g. Conditional Branch  Instruction I 4 can not be executed until the branch condition in I 3 is evaluated and stored.  The branch takes 3 units of time

Group Activity  Show a Gantt chart for 10 instructions that enter a four- stage pipeline (IF, ID, IE, and IS)?  Assume that I 5 fetching process depends on the results of the I 4 evaluation.

Answer

Pipeline and Data Dependency  Data Dependency:  A source operand of instruction I i depends on the results of executing a proceeding I j i > j  E.g.  I j can not be fetched unless the results of I i are saved.

Group Activity  ADD R 1, R 2, R 3 R 3  R 1 + R 2  I i  SL R 3 ; R 3  SL(R 3 )  I i+1  SUB R 5, R 6, R 4 R 4  R 5 – R 6  I i+2  Assume that we have five stages in the pipeline:  IF (Instruction Fetch)  ID (Instruction Decode)  OF (Operand Fetch)  IE (Instruction Execute)  IS (Instruction Store) Show a Gantt chart for this code?

Answer  R 3 in both I i and I i+1 need to be written  Therefore, the problem is a Write after Write Data Dependancy

When stalls occur in the pipeline ?  Write after write  Read after write  Write after read  Read after read  does not cause stall

Read after write

Group Activity Consider the execution of the following sequence of instructions on a five-stage pipeline consisting of IF, ID, OF, IE, and IS. It is required to show the succession of these instructions in the pipeline. Show all types of data dependency? Show the speedup and efficiency?

Answer

No Operation Method  Prevents Fetching the Wrong Instruction / Operand  Equivalent to doing nothing

Group Activity Consider the execution of ten instructions I 1 –I 10 on a pipeline consisting of four pipeline stages: IF, ID, IE, and IS. Assume that instruction I 4 is a conditional branch instruction and that when it is executed, the branch is not taken; that is, the branch condition is not satisfied. Draw Gantt chart showing Nop?

Answer  Prevents Fetching Wrong Instruction

Group Activity Consider the execution of the following piece of code on a five-stage pipeline (IF, ID, OF, IE, IS). Draw Gantt chart with Nop?

Answer  Prevents Fetching Wrong Operands

 Reducing the Stalls Due to Instruction Dependency

Unconditional Branch Instructions  Reordering of Instructions  Use of Dedicated Hardware in the Fetch Unit  Speed up the fetching instruction  Precomputing the Branch and Reordering the Instructions  Instruction prefetch  Instructions can be fetched and stored in the instruction queue.

Conditional Branching Instructions  The target of the conditional branch address will not be known until the execution of the conditional branch has been completed.  Delayed Branch  Fill the pipeline with some instruction until the branch instruction is executed  Prediction of the next instruction to be fetched  It is based on that the branch outcome is random  Assume that the branch is not taken  If the predication is correct, we saved the time  Otherwise, we redo everything

Example  Before delaying  After Delaying

 Reducing Pipeline Stalls due to Data Dependency

Hardware Operand Forwarding  Allows the result of ALU operation to be available to another ALU operation.  SUB can not start until R3 is stored  If we can forward R3 to the Sub at the same time of the store operation  will save a stall time

Group Activity

Group activity int I, X=3; for( i=0;i<10;i++ ) { X= X+ 5 ; }  Assume that we have five stages in the pipeline:  IF (Instruction Fetch)  ID (Instruction Decode)  OF (Operand Fetch)  IE (Instruction Execute)  IS (Instruction Store) Show a Gantt chart for this code?

Group activity int I, X=3; for( i=0;i<10;i++ ) { X= X+ 5 ;} MIPS Code 1.li $t0, 10 # t0 is a constant 10 2.li $t1, 0 # t1 is our counter (i) 3.li $t2, 3 # t2 is our x 4.loop: 5.beq $t1, $t0, end # if t1 == 10 we are done 6.Add $t2, $t2, 5 #Add 5 to x 7.addi $t1, $t1, 1 # add 1 to t1 8.j loop # jump back to the top 9.end: