Computer Architecture Pipelines & Superscalars. Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2.

Slides:

Advertisements

Similar presentations

Computer architecture

Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

Instruction-Level Parallelism (ILP)

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Review of CS 203A Laxmi Narayan Bhuyan Lecture2.

RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.

DLX Instruction Format

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.

EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.

Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

CMPE 421 Parallel Computer Architecture

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

CA406 Computer Architecture Pipelines... continued.

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.

Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.

1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.

Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.

Introduction to Computer Organization Pipelining.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,

ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.

Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

CS203 – Advanced Computer Architecture ILP and Speculation.

Computer Organization CS224

CDA3101 Recitation Section 8

Review: Instruction Set Evolution

/ Computer Architecture and Design

CS203 – Advanced Computer Architecture

Single Clock Datapath With Control

Pipeline Implementation (4.6)

Appendix C Pipeline implementation

CDA 3101 Spring 2016 Introduction to Computer Organization

Pipelining: Advanced ILP

Out of Order Processors

Superscalar Processors & VLIW Processors

The processor: Pipelining and Branching

Superscalar Pipelines Part 2

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

CS 704 Advanced Computer Architecture

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

Pipeline control unit (highly abstracted)

Instruction Execution Cycle

Scoreboarding ENGS 116 Lecture 7 Vincent H. Berk October 5, 2005

Pipeline control unit (highly abstracted)

CS203 – Advanced Computer Architecture

Pipeline Control unit (highly abstracted)

Pipelining Chapter 6.

Pipelining Chapter 6.

Lecture 5: Pipeline Wrap-up, Static ILP

Instruction Level Parallelism

Presentation transcript:

Computer Architecture Pipelines & Superscalars

Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) The last four instructions all depend on a result produced by the first! MIPS instructions have the format op dest, src a, src b

Pipelines - Data hazards Examine the pipeline (ignore first 2!) r2 only updated in time for add!

Pipelines - Data Hazards Compiler solution Insert NOOPs Inefficient!

Pipelines - Data Hazards Second compiler solution Reorder lw $4, 0($1) add $15, $1, $1 sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) sub $2, $1, $3 lw $4, 0($1) add $15, $1, $1 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) These two must not define $1 or $3! Read Written

Pipelines - Data Hazards Second compiler solution Reorder sub $2, $1, $3 lw $4, 0($1) add $15, $1, $1 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) Read Written First use of $2

Pipelines - Data Hazards Compiler analyses dependencies Register definitions Register use Read After Write (RAW) dependency No dependencies Instruction can be moved! sub $2, $1, $3 lw $4, 0($1) add $15, $1, $1 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) Written Uses of $2

Pipelines - Data Hazards Hardware solution Value forwarding Hardware detects dependency scoreboard Forwards result from WB to EX for subsequent use Hardware Transparent to software!

Data Hazards - classification Read after Write (RAW) Instruction 1 must write before instruction 2 reads Write after Write (WAW) Instructions 1 and 2 both write Instruction 2 must write after 1 Write after Read (WAR) Instruction 1 reads Instruction 2 writes (overwrites) Instruction 2 must not write before 1 reads Reordering algorithms must consider all three!

Lecture 5 - Key Points Data Hazards RAW - most common WAW WAR Compiler looks for dependencies then re-orders Hardware Scoreboard Monitors dependencies ensures correct operation Value forwarding hardware Forwards results from EX stage

Pipelines - Exceptions Caused by overflow, underflow Example add $1, $2, $1 Overflow detected in EX stage Causes jump to exception handler as branch - remainder of pipeline flushed but Compiler needs original $1 causing overflow  Register must not be overwritten EX stage needs to squash WB operation Precise Exception problem - more later!

Pipelines - Depth Pipeline can’t be too deep Hazards are frequent èmany stalls in deep pipelines Relative Performance Pipeline Depth Too Deep!

Pipelines - Depth Pipeline can’t be too deep Hazards are frequent èmany stalls in deep pipelines Relative Performance Pipeline Depth Too Deep! Superpipelined

CISC and pipelines High Speed CISC processors are pipelined Overlap IF, EX Variable instruction length running time (number of microcode cycles) èpipeline imbalance è“backup” in pipe stages ècomplicate hazard detection Complex addressing modes èauto-increment updates address register èmultiple memory accesses required èsmooth pipeline flow more difficult!

Instruction Queues Vital performance determinant Rate of instruction fetch High Performance processors Fetch multiple instructions in each cycle common Use wide datapath to memory PowerPC bits = 4 instructions Despatch unit Examine dependencies Determine which instructions can be despatched

Instruction Queues Q “matches” fetch/despatch rates General Strategy for matching Producers - Consumers Use of FIFO-style Queues Absorb Asynchronous Delivery / Consumption Rates Provides Elasticity in pipelines Producer FIFO Consumer Differing Instantaneous Rates

Superscalar Processors

PowerPC organisation PowerPC 601 ~1993 Boundary of the Si die New - Look in the “Example Processors” section of the Web notes 3-way SuperScalar Integer Branch Floating Point A newer machine will have more functional units here!

Superscalar Processors Multiple Functional Units PowerPC 604 ð6-way superscalar Despatch Unit Sends “ready” instructions to all free units PowerPC 604: potential 4 instructions/cycle (pipeline lengths are different!) reality: 2-3 instructions/cycle? (program dependent!) Branch Unit LoadStore Unit 3 Integer Units Floating Point Unit

Superscalar Processors Mix of functional units Up to 8-way superscalar common now 2 Floating point units Usually have ~3 cycle latency 3 Integer Arithmetic Branch unit Load / store unit + ….? Marketing departments can play some games with the ‘ n ’ of a n -way superscalar!

Superscalar – Maximum throughput Instruction Issue Unit is the key! If IIU only issues 4 instructions per cycle, An n -way superscalar ( n >> 4 ) can still only complete 4 instructions / cycle! IIU has many tasks Pre-fetch instructions At least one cache line! Check dependencies Has data required by this instruction been computed yet? Keeps register ‘scoreboard’ Mark registers which will be written by instructions already issued It’s a small dataflow machine (see later!) Check availability of functional units