ECE 252 / CPS 220 Pipelining Professor Alvin R. Lebeck Compsci 220 / ECE 252 Fall 2008.

Slides:

Advertisements

Similar presentations

Lecture 4: CPU Performance

Advertisements

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

Static Scheduling for ILP Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

CMPT 334 Computer Organization

Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.

Lecture: Pipelining Basics

Instruction-Level Parallelism (ILP)

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

Pipelined Datapath and Control (Lecture #13) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Recap (Pipelining). 2 What is Pipelining? A way of speeding up execution of tasks Key idea : overlap execution of multiple taks.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.

L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.

L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.

DLX Instruction Format

Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.

1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )

Appendix A Pipelining: Basic and Intermediate Concepts

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.

-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.

Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.

Morgan Kaufmann Publishers

Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

1 Pipelining Part I CS What is Pipelining? Like an Automobile Assembly Line for Instructions –Each step does a little job of processing the instruction.

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

Pipelining Example Laundry Example: Three Stages

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.

Introduction to Computer Organization Pipelining.

Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.

LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,

L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.

Morgan Kaufmann Publishers

15-740/ Computer Architecture Lecture 7: Pipelining

Pipeline Implementation (4.6)

CDA 3101 Spring 2016 Introduction to Computer Organization

Pipelining: Advanced ILP

Lecture 6: Advanced Pipelines

Serial versus Pipelined Execution

How to improve (decrease) CPI

An Introduction to pipelining

Pipelining Appendix A and Chapter 3.

Lecture 5: Pipeline Wrap-up, Static ILP

Guest Lecturer: Justin Hsia

A relevant question Assuming you’ve got: One washer (takes 30 minutes)

Lecture: Pipelining Basics

Presentation transcript:

ECE 252 / CPS 220 Pipelining Professor Alvin R. Lebeck Compsci 220 / ECE 252 Fall 2008

2 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Administrivia Reading –H&P Appendix A, Chapter 2.3 –This will be partly review for those from ECE 152 Homework Recent Research Paper –“The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays”, Hrishikesh et al., ISCA CompSci 220 / ECE 252

3 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz CompSci 220 / ECE 252 Reading Summary: Performance H&P Chapter 1

4 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Getting more Performance Let’s start with the basic multi-cycle processor that we all know Execution of each instruction involves 5 activities –Fetch, Decode, Execute, Memory Access, Writeback How can we improve performance? –Latency/instruction? –Instruction throughput? Key to improving throughput: parallelism What kinds of parallelism can we exploit? CompSci 220 / ECE 252

5 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Instruction Level Parallelism (ILP) ILP is a property of the software (not the hardware) –how much parallelism exists among instructions? –varies greatly across programs many possible ways to exploit ILP –pipelining: overlap processing of instructions –superscalar: multiple instructions at a time –out-of-order execution: dynamic scheduling –compiler scheduling of code: static scheduling CompSci 220 / ECE 252

6 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz ILP Example add r1, r2, r3 # r1 = r2 + r3 sub r4, r1, r2 mul r5, r1, r4 xor r6, r2, r2 and r7, r6, r1 add r8, r3, r3 On a “perfectly parallel” machine, how many cycles would this code snippet take? –Assume that all operations take 1 cycle CompSci 220 / ECE 252

7 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz What is Limit of ILP? H&P Chapter 3 focuses on this issue Two important performance limiters –Limited ILP - why? –Inability to exploit all available ILP - why? What kinds of software have more/less ILP? We’ll now talk about one way to exploit ILP: pipelining CompSci 220 / ECE 252

8 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Basic Pipelined Processor basic = single, in-order issue –single issue = one instruction at a time (per stage) –in-order issue = instructions (start to) execute in order –next units: multiple issue, out-of-order issue pipelining principles –tradeoff: clock rate vs. IPC –hazards: structural, data, control vanilla pipeline: single-cycle operations –structural hazards, RAW hazards, control hazards dealing with multi-cycle operations –more structural hazards, WAW hazards, precise state pipelining meets the x86 ISA CompSci 220 / ECE 252

9 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Pipelining observe: instruction processing consists of N sequential stages idea: overlap different instructions at different stages increase resource utilization: fewer stages sitting idle increase completion rate (throughput): up to 1 in 1/N time almost every processor built since 1970 is pipelined –first pipelined processor: IBM Stretch [1962] CompSci 220 / ECE 252

10 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Without Pipelining 5 parts of instruction execution –fetch (F, IF): fetch instruction from I$ –decode (D, ID): decode instruction, read input registers –execute (X, EX): ALU, load/store address, branch outcome –memory access (M, MEM): load/store to D$/DTLB –writeback (W, WB): write results (from ALU or ld) back to register file CompSci 220 / ECE 252

11 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Simple 5-Stage Pipeline 5 stages (pipeline depth is 5) –fetch (F, IF): fetch instruction from I$ –decode (D, ID): decode instruction, read input registers –execute (X, EX): ALU, load/store address, branch outcome –memory access (M, MEM): load/store to D$/DTLB –writeback (W, WB): write results (from ALU or ld) back to register file stages divided by pipeline registers/latches CompSci 220 / ECE 252

12 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Pipeline Registers (Latches) contain info for controlling flow of instructions through pipe –PC: PC –F/D: PC, undecoded instruction –D/X: PC, opcode, regfile[rs1], regfile[rs2], immed, rd –X/M: opcode (why?), regfile[rs1], ALUOUT, rd –M/W: ALUOUT, MEMOUT, rd CompSci 220 / ECE 252

13 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Pipeline Diagram Inst0FDXMW Inst1FDXMW Inst2FDXMW Inst3FDXMW Compared to non-pipelined case: –Better throughput: an instruction finishes every cycle –Same latency per instruction: each still takes 5 cycles CompSci 220 / ECE 252

14 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Principles of Pipelining let: instruction execution require N stages, each takes t n time –un-pipelined processor »single-instruction latency T = St n » throughput = 1/T = 1/St n »M-instruction latency = M*T (M>>1) –now: N-stage pipeline »single-instruction latency T = St n (same as unpipelined) »throughput = 1/ max(t n ) <= N/T (max(t n ) is the bottleneck) » if all t n are equal (i.e., max(t n ) = T/N), then throughput = N/T »M-instruction latency (M >> 1) = M * max(t n ) <= M*T/N »speedup <= N –can we choose N to get arbitrary speedup? CompSci 220 / ECE 252

15 © 2008 Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti, Katz Wrong (part I): Pipeline Overhead V := oVerhead delay per pipe stage –cause #1: latch overhead »pipeline registers take time –cause #2: clock/data skew so, for an N-stage pipeline with overheads –single-instruction latency T = S(V + t n ) = N*V + St n – throughput = 1/(max(t n ) + V) <= N/T (and <= 1/V) –M-instruction latency = M*(max(t n ) + V) <= M*V + M*T/N –speedup = T/(V+max(t n )) <= N Overhead limits throughput, speedup & useful pipeline depth CompSci 220 / ECE 252