Pipelining, Superscalar, and Out-of-order architectures

Slides:



Advertisements
Similar presentations
© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Lecture 1 An Overview of High-Performance Computer Architecture ECE 463/521 Spring 2006.
Advertisements

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
CMPT 334 Computer Organization
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
Goal: Describe Pipelining
11/1/2005Comp 120 Fall November Exam Postmortem 11 Classes to go! Read Sections 7.1 and 7.2 You read 6.1 for this time. Right? Pipelining then.
CS Computer Architecture 1 CS 430 – Computer Architecture Pipelined Execution - Review William J. Taffe using slides of David Patterson.
1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.
Computer Organization
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 19 - Pipelined.
Instruction Level Parallelism (ILP) Colin Stevens.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
CS 300 – Lecture 24 Intro to Computer Architecture / Assembly Language The LAST Lecture!
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
Pipelining Enhancing Performance. Datapath as Designed in Ch. 5 Consider execution of: lw $t1,100($t0) lw $t2,200($t0) lw $t3,300($t0) Datapath segments.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Lecture 16: Basic Pipelining
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
10/11: Lecture Topics Execution cycle Introduction to pipelining
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
PipeliningPipelining Computer Architecture (Fall 2006)
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards 1 Instructors: Vladimir Stojanovic and Nicholas Weaver
Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.
Use of Pipelining to Achieve CPI < 1
Pipelining – Loop unrolling and Multiple Issue
Memory – Caching: Writes
Chapter Six.
CS 352H: Computer Systems Architecture
Lecture 18: Pipelining I.
Computer Organization CS224
Pipelining Chapter 6.
CS/COE 1541 (term 2174) Jarrett Billingsley
CSCI206 - Computer Organization & Programming
Pipelining – Out-of-order execution and exceptions
Lecture 16: Basic Pipelining
Performance of Single-cycle Design
Instructor: Justin Hsia
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Morgan Kaufmann Publishers The Processor
Pipelining Chapter 6.
Lecturer: Alan Christopher
CSCI206 - Computer Organization & Programming
Computer Architecture
CSCI206 - Computer Organization & Programming
Chapter Six.
Chapter Six.
Pipelining Chapter 6.
November 5 No exam results today. 9 Classes to go!
Multicycle and Microcode
Chapter 8. Pipelining.
Pipelining: Basic Concepts
CSC3050 – Computer Architecture
Pipelining Chapter 6.
Pipelining Chapter 6.
Lecture 1 An Overview of High-Performance Computer Architecture
Instruction Level Parallelism
Guest Lecturer: Justin Hsia
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
Pipelining.
Presentation transcript:

Pipelining, Superscalar, and Out-of-order architectures CS/COE 0447 Jarrett Billingsley

Class announcements Today is a bit of a 1541 preview Review Monday, non-cumulative exam Wednesday OMETs! CS447

Pipelining CS447

The idea when you do several loads of laundry… …do you wash, dry, and fold one load BEFORE doing the next? or do you start on the next one while the previous one is drying? CS447

chugga chugga choo choo (animated) what if… we just kept fetching instructions on each clock cycle? Memory F D X Memory again? M sw sub add Control Register File W CS447

But… why did we make any individual instruction faster? no, the add and sub still took 4 cycles… but they finished faster. how? hi! by overlapping the instructions, we increased the throughput. in any given clock cycle, we're now doing more work. this is how we can get the CPI down closer to 1. pipelining is essentially required for reasonable performance now. CS447

CPI is an average CPI = 7 ÷ 4 = 1.75 for any program, we can count cycles and instructions, then divide. Time 1 2 3 4 5 6 7 add t0,t1,t2 add t3,t4,t5 add s0,s1,s2 add s3,s4,s5 F D X W CPI = 7 ÷ 4 = 1.75 F D X W F D X W F D X W CS447

Structural hazards two instructions need to use the same hardware at the same time Time 1 2 3 4 5 6 7 lw t0,0($0) lw t1,4($0) lw t2,8($0) lw t3,12($0) F D X W M F D X W M F D X W M F D X W M CS447

Data hazards an instruction depends on the result of a previous one Time 1 2 3 4 5 6 7 add t0,t1,t2 sub s0,t0,t1 F D X W F D X W sub can't use the value of t0 until the beginning of cycle 4 or can it...? CS447

Control hazards you don't know the outcome of a conditional branch Time 1 2 3 4 5 6 7 beq t0,0,end add t0,t1,t2 lw t4,(t0) uh oh F D X F D F what if, at cycle 3, we realize we should have taken the branch? what do we do with the 2 in-progress instructions? should we have fetched them at all? CS447

Superscalar CPUs CS447

the absolute best we can do is finish 1 instruction each cycle The holy grail CPI=1 the absolute best we can do is finish 1 instruction each cycle … or is it CS447

Breaking the sound CPI=1 barrier a scalar CPU can finish at most 1 instruction per cycle a superscalar CPU finishes more than 1 instruction per cycle fetch multiple instructions at once ALU Pipe I-cache Control Register File D-cache ALU an add would go here Memory Pipe a load would go here CS447

Example code CPI = 5 ÷ 7 = 0.71?? IPC = 7 ÷ 5 = 1.4! lw t0, 0(s1) a superscalar CPU might execute the following code like this… lw t0, 0(s1) lw t1, -4(s1) sub s1, s1, 8 add t0, t0, s2 add t1, t1, s2 sw t0, 8(s1) sw t1, 4(s1) Cycle ALU Pipe Mem Pipe lw t0 1 sub s1 lw t1 2 add t0 3 add t1 sw t0 4 sw t1 CPI = 5 ÷ 7 = 0.71?? IPC = 7 ÷ 5 = 1.4! CS447

Out-of-order CPUs CS447

these data dependencies form "threads" that "weave" through the code We have to go deeper instruction streams are inherently linear but the computations they describe might not be these data dependencies form "threads" that "weave" through the code lw t0, 0(s1) lw t1, -4(s1) sub s1, s1, 8 add t0, t0, s2 add t1, t1, s2 sw t0, 8(s1) sw t1, 4(s1) we could exploit this fact to reorder the instructions to run in a more convenient order… at runtime. CS447

Bringing order to the chaos what if we ran them more like this? lw t0,0(s1) Cyc 1 2 3 lw t1,-4(s1) sub s1,s1,8 wait for loads… add t0,t0,s2 add t1,t1,s2 sw t0,8(s1) IPC = 7 ÷ 4 = 1.75! sw t1,4(s1) CS447

screaming isn't it crazy??? and then there's caching and the memory hierarchy and then there's simultaneous multithreading and then there's…… so much more. take CS1541 if this stuff interests you. CS447

and… that's the end. CS447