10/11: Lecture Topics Execution cycle Introduction to pipelining

Slides:

Advertisements

Similar presentations

Lecture 4: CPU Performance

Advertisements

Morgan Kaufmann Publishers The Processor

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Lecture 1 An Overview of High-Performance Computer Architecture ECE 463/521 Spring 2006.

COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

CMPT 334 Computer Organization

Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.

10/11: Lecture Topics Slides on starting a program from last time Where we are, where we’re going RISC vs. CISC reprise Execution cycle Pipelining Hazards.

Goal: Describe Pipelining

Instruction-Level Parallelism (ILP)

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.

Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

Goal: Reduce the Penalty of Control Hazards

L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.

CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?

Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.

1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.

-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.

Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

CMPE 421 Parallel Computer Architecture

5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.

1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

Pipelining Example Laundry Example: Three Stages

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.

11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.

Introduction to Computer Organization Pipelining.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.

Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.

Computer Organization

CSCI206 - Computer Organization & Programming

Performance of Single-cycle Design

Single Clock Datapath With Control

Pipeline Implementation (4.6)

CDA 3101 Spring 2016 Introduction to Computer Organization

Pipelining: Advanced ILP

Morgan Kaufmann Publishers The Processor

Pipelining review.

Pipelining Chapter 6.

Pipelining in more detail

CSCI206 - Computer Organization & Programming

Pipeline control unit (highly abstracted)

November 5 No exam results today. 9 Classes to go!

Instruction Execution Cycle

Pipeline control unit (highly abstracted)

Pipeline Control unit (highly abstracted)

Reducing pipeline hazards – three techniques

Introduction to Computer Organization and Architecture

Throughput = #instructions per unit time (seconds/cycles etc.)

Guest Lecturer: Justin Hsia

Presentation transcript:

10/11: Lecture Topics Execution cycle Introduction to pipelining Data hazards

Office Hours Changing Th 8:30-9:30 to Mo 2-3 New office hours are Tu 2:30-3:30

Execution Cycle Five steps to executing an instruction: 1. Fetch IF ID EX MEM WB Five steps to executing an instruction: 1. Fetch Get the next instruction to execute from memory onto the chip 2. Decode Figure out what the instruction says to do Get values from registers 3. Execute Do what the instruction says; for example, On a memory reference, add up base and offset On an arithmetic instruction, do the math

More Execution Cycle 4. Memory Access 5. Write back IF ID EX MEM WB If it’s a load or store, access memory If it’s a branch, replace the PC with the destination address Otherwise do nothing 5. Write back Place the result of the operation in the appropriate register

add $s0, $s1, $s2 IF get instruction at PC from memory it’s 000000 10001 10010 10000 00000 100000 ID determine what 000000 … 100000 is 000000 … 100000 is add get contents of $s1 and $s2 ($s1=7, $s2=12) EX add 7 and 12 = 19 MEM do nothing WB store 19 in $s0

lw $t2, 16($s0) IF get instruction at PC from memory it’s 010111 10000 01000 0000000000010000 ID determine what 010111 is 010111 is lw get contents of $s0 and $t2 (we don’t know that we don’t care about $t2) $s0=0x200D1C00, $t2=77763 EX add 16 to 0x200D1C00 = 0x200D1C10 MEM load the word stored at 0x200D1C10 WB store loaded value in $t2

Latency & Throughput IF ID EX MEM WB 1 2 3 4 5 6 7 8 9 10 inst 1 inst 2 Latency—the time it takes for an individual instruction to execute What’s the latency for this implementation? Throughput—the number of instructions that execute per minute What’s the throughput of this implementation?

A case for pipelining The functional units are being underutilized the instruction fetcher is used once every five clock cycles why not have it fetch a new instruction every clock cycle? Pipelining overlaps the stages of execution so every stage has something to due each cycle A pipeline with N stages could speedup by N times, but each stage must take the same amount of time each stage must always have work to do Also, latency for each instruction may go up, but why don’t we care?

Unpipelined Assembly Line What is the latency of this assembly line, i.e. for how many cycles is the plane on the assembly line? What is the throughput of this assembly line, i.e. how many planes are manufactured each cycle?

Pipelined Assembly Line The assembly line has 5 stages If a plane isn’t ready to go to the next stage then the pipeline stalls that stage and all stages before it freeze The gap in the assembly line is known as a bubble

Pipelined Analysis What is the latency? What is the throughput? What is the speed up? (Speed up = Old Time / New Time)

Pipeline Example 1 2 3 4 5 6 7 8 9 10 IF ID EX MEM WB add $s0, $s1, $s2 sub $s3, $s2, $s3 lw $s2, 20($t0) sw $s0, 16($s1) and $t1, $t2, $t3 IF ID EX MEM WB

Pipelined Xput and Latency 1 2 3 4 5 6 7 8 9 IF ID EX MEM WB inst 1 inst 2 inst 3 inst 4 inst 5 What’s the throughput of this implementation? What’s the latency of this implementation?

Data Hazards What happens in the following code? IF ID EX MEM WB add $s0, $s1, $s2 IF ID EX MEM WB add $s4, $s3, $s0 $s0 is read here $s0 is written here This is called as a data dependency When it causes a pipeline stall it is called a data hazard

Solution: Stall Stall the pipeline until the result is available add s0,s1,s2 IF ID EX MEM WB add s4,s3,s0 IF stall ID EX MEM WB Stall the pipeline until the result is available

Solution: Read & Write in same Cycle Write the register in the first part of the clock cycle Read it in the second part of the clock cycle add s0,s1,s2 add s4,s3,s0 IF stall ID EX MEM WB write $s0 read $s0 A stall of two cycles is still required

Solution: Forwarding The value of $s0 is known after cycle 3 (after the first instruction’s EX stage) The value of $s0 isn’t needed until cycle 4 (before the second instruction’s EX stage) If we forward the result there isn’t a stall add s0,s1,s2 add s4,s3,s0 IF ID EX MEM WB

Another data hazard What if the first instruction is lw? lw s0,0(s2) add s4,s3,s0 IF ID EX MEM WB s0 isn’t known until after the MEM stage We can’t forward back into the past Either stall or reorder instructions

Solutions to the lw hazard We can stall for one cycle, but we hate to stall lw s0,0(s2) add s4,s3,s0 IF ID EX MEM WB Try to execute an unrelated instruction between the two instructions lw s0,0(s2) IF ID EX MEM WB IF ID EX MEM WB sub t4,t2,t3 IF ID EX MEM WB add s4,s3,s0 sub t4,t2,t3

Reordering Instructions Reordering instructions is a common technique for avoiding pipeline stalls Sometimes the compiler does the reordering statically Almost all modern processors do this reordering dynamically they can see several instructions and they execute anyone that has no dependency this is known as out-of-order execution and is very complicated to implement

Control Hazards Branch instructions cause control hazards because we don’t know which instruction to execute next IF ID EX MEM WB bne $s0, $s1, next add $s4, $s3, $s0 ... IF ID EX MEM WB next: sub $s4, $s3, $s0 do we fetch add or sub? we don’t know until here