Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

Computer architecture
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
CMPT 334 Computer Organization
Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.
Instruction-Level Parallelism (ILP)
Computer Architecture Pipelines & Superscalars. Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2.
Chapter 12 Pipelining Strategies Performance Hazards.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
DLX Instruction Format
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Appendix A Pipelining: Basic and Intermediate Concepts
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
Parallelism Processing more than one instruction at a time. Pipelining
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
Computer Architecture System Interface Units Iolanthe II approaches Coromandel Harbour.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CA406 Computer Architecture Pipelines... continued.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
Computer Architecture System Interface Units Iolanthe II in the Bay of Islands.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
EKT303/4 Superscalar vs Super-pipelined.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)
Pipelining Example Laundry Example: Three Stages
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
PipeliningPipelining Computer Architecture (Fall 2006)
CSE431 L13 SS Execute & Commit.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 13: SS Backend (Execute, Writeback & Commit) Mary Jane.
Chapter Six.
CDA3101 Recitation Section 8
CS203 – Advanced Computer Architecture
Single Clock Datapath With Control
Pipeline Implementation (4.6)
CDA 3101 Spring 2016 Introduction to Computer Organization
Pipelining: Advanced ILP
Out of Order Processors
Superscalar Processors & VLIW Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Pipeline control unit (highly abstracted)
Chapter Six.
Chapter Six.
November 5 No exam results today. 9 Classes to go!
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
CS203 – Advanced Computer Architecture
Pipeline Control unit (highly abstracted)
Reducing pipeline hazards – three techniques
Pipelining Chapter 6.
Guest Lecturer: Justin Hsia
Presentation transcript:

Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga

Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) The last four instructions all depend on a result produced by the first! MIPS instructions have the format op dest, src a, src b

Pipelines - Data hazards Examine the pipeline (ignore first 2!) r2 only updated in time for add!

Pipelines - Data Hazards Compiler solution Insert NOOPs Inefficient!

Pipelines - Data Hazards Second compiler solution Reorder lw $4, 0($1) add $15, $1, $1 sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) sub $2, $1, $3 lw $4, 0($1) add $15, $1, $1 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) These two must not define $1 or $3! Read Written

Pipelines - Data Hazards Second compiler solution Reorder sub $2, $1, $3 lw $4, 0($1) add $15, $1, $1 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) Read Written First use of $2

Pipelines - Data Hazards Compiler analyses dependencies Register definitions Register use Read After Write (RAW) dependency No dependencies Instruction can be moved! sub $2, $1, $3 lw $4, 0($1) add $15, $1, $1 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) Written Uses of $2

Pipelines - Data Hazards Hardware solution Value forwarding Hardware detects dependency scoreboard Forwards result from WB to EX for subsequent use Hardware Transparent to software!

Data Hazards - classification Read after Write (RAW) Instruction 1 must write before instruction 2 reads Write after Write (WAW) Instructions 1 and 2 both write Instruction 2 must write after 1 Write after Read (WAR) Instruction 1 reads Instruction 2 writes (overwrites) Instruction 2 must not write before 1 reads Reordering algorithms must consider all three!

Lecture 5 - Key Points Data Hazards RAW - most common WAW WAR Compiler looks for dependencies then re-orders Hardware Scoreboard Monitors dependencies ensures correct operation Value forwarding hardware Forwards results from EX stage

Pipelines - Exceptions Caused by overflow, underflow Example add $1, $2, $1 Overflow detected in EX stage Causes jump to exception handler as branch - remainder of pipeline flushed but Compiler needs original $1 causing overflow  Register must not be overwritten EX stage needs to squash WB operation Precise Exception problem - more later!

Superpipelines

Time to complete each instruction = t Total: Fetch + decode + fetch operands + operation + write-back Clock frequency: f = 1/t An n -stage pipeline allows n instructions ‘in flight’ simultaneously Each pipeline stage does 1/n of the work  Each stage requires time t/n Assumes a perfectly balanced pipeline! Balanced = each stage requires the same time  Clock frequency: f pipe = 1/(t/n) = n/t  Increasing n increases processor power?

Pipelines - Depth Pipeline can’t be too deep Hazards are frequent èmany stalls in deep pipelines Relative Performance Pipeline Depth Too Deep!

Pipelines - Depth Pipeline can’t be too deep Hazards are frequent èmany stalls in deep pipelines Relative Performance Pipeline Depth Too Deep! Superpipelined

Pipeline depth Increasing number of stages Each stage adds overheads Problems balancing pipeline Require t pd 1 ≈ t pd 2 ≈ t pd 3 Stage time is t pd j + t pd reg n stages means n t pd reg overhead Register Operation (work) Register Operation (work) Operation (work) t pd reg t pd 1 t pd 2 t pd 3 t pd reg

CISC and pipelines High Speed CISC processors are pipelined Overlap IF, EX Variable instruction length running time (number of microcode cycles) èpipeline imbalance è“backup” in pipe stages ècomplicate hazard detection Complex addressing modes èauto-increment updates address register èmultiple memory accesses required èsmooth pipeline flow more difficult!

Instruction Queues Vital performance determinant Rate of instruction fetch High Performance processors Fetch multiple instructions in each cycle common Use wide datapath to memory PowerPC bits = 4 instructions Despatch unit Examine dependencies Determine which instructions can be despatched

Instruction Queues Q “matches” fetch/despatch rates General Strategy for matching Producers - Consumers Use of FIFO-style Queues Absorb Asynchronous Delivery / Consumption Rates Provides Elasticity in pipelines Producer FIFO Consumer Differing Instantaneous Rates

Superscalar Processors

PowerPC organisation PowerPC 601 ~1993 Boundary of the Si die New - Look in the “Example Processors” section of the Web notes 3-way SuperScalar Integer Branch Floating Point A newer machine will have more functional units here!

Superscalar Processors Multiple Functional Units PowerPC 604 ð6-way superscalar Despatch Unit Sends “ready” instructions to all free units PowerPC 604: potential 4 instructions/cycle (pipeline lengths are different!) reality: 2-3 instructions/cycle? (program dependent!) Branch Unit LoadStore Unit 3 Integer Units Floating Point Unit

Superscalar Processors Mix of functional units Up to 8-way superscalar common now 2 Floating point units Usually have ~3 cycle latency 3 Integer Arithmetic Branch unit Load / store unit + ….? Marketing departments can play some games with the ‘ n ’ of a n -way superscalar!

Pentium Quad Core Distinguish between Multiple ‘cores’ (separate processors) – later – and Superscalars – multiple functional units per processor ☺“Wide dynamic execution” in Intel-speak Quad core 4 cores Complete up to 4 instructions / cycle each IIU can issue four instructions / cycle 3 Mb L2 cache / processor (total 12Mb) Master clock 3.2 GHz, front side bus 1.6GHz 771 pins

Superscalar Limitations To achieve maximum performance Instruction mix must match Functional Unit mix eg if we have 2 Integer ALUs, 2 FPUs, 1 branch unit, 1 load/store unit Instruction issue unit (IIU) can issue 4 instructions Each four instructions should be able to use 4 of the functional units If instruction stream doesn’t have right mix Some functional units will remain idle FPUs require multiple cycles Additional stalls Pipeline hazards stall pipeline 4-way superscalar gets instructions completed per cycle Program dependent!