Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Similar presentations


Presentation on theme: "Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined."— Presentation transcript:

1 Chapter 4 The Processor

2 Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined version Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j §4.1 Introduction

3 Login using Username : your username Password : your email password. Uoh.blackboard.com

4 Go to “Courses” menu

5 Select “201401_COE308_001_3646: Computer Architecture ”201401_COE308_001_3646:

6 Select “Content “

7 Slides

8 First Task

9

10 Chapter 4 — The Processor — 10 Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance §4.5 An Overview of Pipelining Four loads: Speedup = 8/3.5 = 2.3

11 Chapter 4 — The Processor — 11 MIPS Pipeline Five stages, one step per stage 1.IF: Instruction fetch from memory 2.ID: Instruction decode & register read 3.EX: Execute operation or calculate address 4.MEM: Access memory operand 5.WB: Write result back to register

12 Chapter 4 — The Processor — 12 Pipeline Performance Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapath InstrInstr fetchRegister read ALU opMemory access Register write Total time lw200ps100 ps200ps 100 ps800ps sw200ps100 ps200ps 700ps R-format200ps100 ps200ps100 ps600ps beq200ps100 ps200ps500ps

13 Chapter 4 — The Processor — 13 Pipeline Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps)

14 BasicIdea Basic Idea  Assembly Line  Divide the execution of a task among a number of stages  A task is divided into subtasks to be executed in sequence  Performance improvement compared to sequential execution

15 Pipeline Task 1 2 n Sub-tasks 1 2 n Pipeline Stream of Tasks

16 5 Tasks on 4 stage pipeline Task 1 Task 2 Task 3 Task 4 Task 5 1 23 4 5 67 8 Time

17 Speedup t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Speedup = n * m/n + m -1

18 Efficiency t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Efficiency = Speedup/ n =m/(n+m-1)

19 Throughput t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Throughput = no. of tasks executed per unit of time = m/((n+m-1) x t)

20 Instruction Pipeline  Pipeline stall  Some of the stages might need more time to perform its function.  E.g. I 2 needs 3 time units to perform its function  This is called a “Bubble” or “pipeline hazard”

21 Pipeline and Instruction Dependency Instruction Dependency The operation performed by a stage depends on the operation(s) performed by other stage(s). E.g. Conditional Branch  Instruction I 4 can not be executed until the branch condition in I 3 is evaluated and stored.  The branch takes 3 units of time

22 Group Activity  Show a Gantt chart for 10 instructions that enter a four- stage pipeline (IF, ID, IE, and IS)?  Assume that I 5 fetching process depends on the results of the I 4 evaluation.

23 Answer

24 Pipeline and Data Dependency  Data Dependency:  A source operand of instruction I i depends on the results of executing a proceeding I j i > j  E.g.  I j can not be fetched unless the results of I i are saved.

25 Group Activity  ADD R 1, R 2, R 3 R 3  R 1 + R 2  I i  SL R 3 ; R 3  SL(R 3 )  I i+1  SUB R 5, R 6, R 4 R 4  R 5 – R 6  I i+2  Assume that we have five stages in the pipeline:  IF (Instruction Fetch)  ID (Instruction Decode)  OF (Operand Fetch)  IE (Instruction Execute)  IS (Instruction Store) Show a Gantt chart for this code?

26 Answer  R 3 in both I i and I i+1 need to be written  Therefore, the problem is a Write after Write Data Dependancy

27 When stalls occur in the pipeline ?  Write after write  Read after write  Write after read  Read after read  does not cause stall

28 Read after write

29 Group Activity Consider the execution of the following sequence of instructions on a five-stage pipeline consisting of IF, ID, OF, IE, and IS. It is required to show the succession of these instructions in the pipeline. Show all types of data dependency? Show the speedup and efficiency?

30 Answer

31 No Operation Method  Prevents Fetching the Wrong Instruction / Operand  Equivalent to doing nothing

32 Group Activity Consider the execution of ten instructions I 1 –I 10 on a pipeline consisting of four pipeline stages: IF, ID, IE, and IS. Assume that instruction I 4 is a conditional branch instruction and that when it is executed, the branch is not taken; that is, the branch condition is not satisfied. Draw Gantt chart showing Nop?

33 Answer  Prevents Fetching Wrong Instruction

34 Group Activity Consider the execution of the following piece of code on a five-stage pipeline (IF, ID, OF, IE, IS). Draw Gantt chart with Nop?

35 Answer  Prevents Fetching Wrong Operands

36  Reducing the Stalls Due to Instruction Dependency

37 Unconditional Branch Instructions  Reordering of Instructions  Use of Dedicated Hardware in the Fetch Unit  Speed up the fetching instruction  Precomputing the Branch and Reordering the Instructions  Instruction prefetch  Instructions can be fetched and stored in the instruction queue.

38 Conditional Branching Instructions  The target of the conditional branch address will not be known until the execution of the conditional branch has been completed.  Delayed Branch  Fill the pipeline with some instruction until the branch instruction is executed  Prediction of the next instruction to be fetched  It is based on that the branch outcome is random  Assume that the branch is not taken  If the predication is correct, we saved the time  Otherwise, we redo everything

39 Example  Before delaying  After Delaying

40  Reducing Pipeline Stalls due to Data Dependency

41 Hardware Operand Forwarding  Allows the result of ALU operation to be available to another ALU operation.  SUB can not start until R3 is stored  If we can forward R3 to the Sub at the same time of the store operation  will save a stall time

42 Group Activity

43

44 Group activity int I, X=3; for( i=0;i<10;i++ ) { X= X+ 5 ; }  Assume that we have five stages in the pipeline:  IF (Instruction Fetch)  ID (Instruction Decode)  OF (Operand Fetch)  IE (Instruction Execute)  IS (Instruction Store) Show a Gantt chart for this code?

45 Group activity int I, X=3; for( i=0;i<10;i++ ) { X= X+ 5 ;} MIPS Code 1.li $t0, 10 # t0 is a constant 10 2.li $t1, 0 # t1 is our counter (i) 3.li $t2, 3 # t2 is our x 4.loop: 5.beq $t1, $t0, end # if t1 == 10 we are done 6.Add $t2, $t2, 5 #Add 5 to x 7.addi $t1, $t1, 1 # add 1 to t1 8.j loop # jump back to the top 9.end:


Download ppt "Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined."

Similar presentations


Ads by Google