Systems I Pipelining IV

Slides:



Advertisements
Similar presentations
Randal E. Bryant Carnegie Mellon University CS:APP CS:APP Chapter 4 Computer Architecture PipelinedImplementation Part II CS:APP Chapter 4 Computer Architecture.
Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Chapter 8. Pipelining.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
1 Seoul National University Wrap-Up. 2 Overview Seoul National University Wrap-Up of PIPE Design  Exception conditions  Performance analysis Modern.
Real-World Pipelines: Car Washes Idea  Divide process into independent stages  Move objects through stages in sequence  At any instant, multiple objects.
PipelinedImplementation Part I PipelinedImplementation.
Instructor: Erol Sahin
– 1 – Chapter 4 Processor Architecture Pipelined Implementation Chapter 4 Processor Architecture Pipelined Implementation Instructor: Dr. Hyunyoung Lee.
Computer Architecture Carnegie Mellon University
S. Barua – CPSC 440 CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining.
PipelinedImplementation Part I CSC 333. – 2 – Overview General Principles of Pipelining Goal Difficulties Creating a Pipelined Y86 Processor Rearranging.
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
Wrap-Up CSC 333. – 2 – Overview Wrap-Up of PIPE Design Performance analysis Fetch stage design Exceptional conditions Modern High-Performance Processors.
DLX Instruction Format
Appendix A Pipelining: Basic and Intermediate Concepts
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Datapath Design II Topics Control flow instructions Hardware for sequential machine (SEQ) Systems I.
Pipelining III Topics Hazard mitigation through pipeline forwarding Hardware support for forwarding Forwarding to mitigate control (branch) hazards Systems.
David O’Hallaron Carnegie Mellon University Processor Architecture PIPE: Pipelined Implementation Part I Processor Architecture PIPE: Pipelined Implementation.
Data Hazard Solution 2: Data Forwarding Our naïve pipeline would experience many data stalls  Register isn’t written until completion of write-back stage.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
1 Seoul National University Pipelined Implementation : Part I.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
1 Naïve Pipelined Implementation. 2 Outline General Principles of Pipelining –Goal –Difficulties Naïve PIPE Implementation Suggested Reading 4.4, 4.5.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Computer Architecture: Wrap-up CENG331 - Computer Organization Instructors: Murat Manguoglu(Section 1) Erol Sahin (Section 2 & 3) Adapted from slides of.
Pipeline Architecture I Slides from: Bryant & O’ Hallaron
PipelinedImplementation Part II PipelinedImplementation.
Computer Architecture: Pipelined Implementation - I
Computer Architecture adapted by Jason Fritts
1 Seoul National University Pipelining Basics. 2 Sequential Processing Seoul National University.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Pipelining IV Topics Implementing pipeline control Pipelining and performance analysis Systems I.
Randal E. Bryant Carnegie Mellon University CS:APP2e CS:APP Chapter 4 Computer Architecture PipelinedImplementation Part II CS:APP Chapter 4 Computer Architecture.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
1 Pipelined Implementation. 2 Outline Handle Control Hazard Handle Exception Performance Analysis Suggested Reading 4.5.
Real-World Pipelines Idea Divide process into independent stages
ARM Organization and Implementation
Lecture 14 Y86-64: PIPE – pipelined implementation
Administrivia Midterm to be posted on Tuesday after class
Pipelined Implementation : Part I
Seoul National University
Seoul National University
Instruction Decoding Optional icode ifun valC Instruction Format
Pipelined Implementation : Part II
Systems I Pipelining III
Computer Architecture adapted by Jason Fritts
Systems I Pipelining II
Pipelined Implementation : Part I
Lecture 5: Pipelining Basics
Seoul National University
Control unit extension for data hazards
Pipeline Architecture I Slides from: Bryant & O’ Hallaron
Pipelined Implementation : Part I
Pipelined Implementation
Pipelining: critical path, pipeline hazards Prof. Eric Rotenberg
Control unit extension for data hazards
Systems I Pipelining II
Chapter 4 Processor Architecture
Systems I Pipelining II
Introduction to Computer Organization and Architecture
Control unit extension for data hazards
Pipelined Implementation
Real-World Pipelines: Car Washes
Presentation transcript:

Systems I Pipelining IV Topics Implementing pipeline control Pipelining and performance analysis

Implementing Pipeline Control Combinational logic generates pipeline control signals Action occurs at start of following cycle

Initial Version of Pipeline Control bool F_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; bool D_stall = E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }; bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Bubble for ret bool E_bubble = # Load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};

Control Combinations Combination A Combination B Special cases that can arise on same clock cycle Combination A Not-taken branch ret instruction at branch target Combination B Instruction that reads from memory to %esp Followed by ret instruction

Control Combination A Should handle as mispredicted branch JXX E D M Mispredict ret 1 Combination A Condition F D E M W Processing ret stall bubble normal Mispredicted Branch Combination Should handle as mispredicted branch Stalls F pipeline register But PC selection logic will be using M_valM anyhow

Stall in F Your book provides two inconsistent meanings for “stall in F” Instruction remains in F and injects a bubble into D Instruction squashed into D, same PC fetched Figure 4.61 Use the one that keeps 1 instr per pipeline stage E M W D F

JXX + ret works great! F D E M W 0x000: xorl % eax ,% 0x002: jne 1 2 3 4 5 6 7 8 9 F D E M W 0x002: jne target # Not taken 10 0x011: t: ret # Target bubble 0x012: nop # Target + 1 0x007: irmovl $1,% # Fall through 0x00d:

Control Combination B Load/use ret ret ret 1 1 1 M M M M E Load E E E D Use D D D ret ret ret Combination B Condition F D E M W Processing ret stall bubble normal Load/Use Hazard Combination bubble + stall Would attempt to bubble and stall pipeline register D Signaled by processor as pipeline error

Handling Control Combination B Load/use ret ret ret 1 1 1 M M M M E Load E E E D Use D D D ret ret ret Combination B Condition F D E M W Processing ret stall bubble normal Load/Use Hazard Combination Load/use hazard should get priority ret instruction should be held in decode stage for additional cycle

Corrected Pipeline Control Logic bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }); Condition F D E M W Processing ret stall bubble normal Load/Use Hazard Combination Load/use hazard should get priority ret instruction should be held in decode stage for additional cycle

Load/use hazard with ret mrmovl F D ret F mrmovl F D E ret F D addl F mrmovl F D E M bubble E ret F D D addl F F mrmovl F D E M W bubble E M ret F D D E addl F F bubble D addl F

Pipeline Summary Data Hazards Control Hazards Control Combinations Most handled by forwarding No performance penalty Load/use hazard requires one cycle stall Control Hazards Cancel instructions when detect mispredicted branch Two clock cycles wasted Stall fetch stage while ret passes through pipeline Three clock cycles wasted Control Combinations Must analyze carefully First version had subtle bug Only arises with unusual instruction combination

Performance Analysis with Pipelining Ideal pipelined machine: CPI = 1 One instruction completed per cycle But much faster cycle time than unpipelined machine However - hazards are working against the ideal Hazards resolved using forwarding are fine Stalling degrades performance and instruction comletion rate is interrupted CPI is measure of “architectural efficiency” of design

CPI = C/I = (I+B)/I = 1.0 + B/I CPI for PIPE CPI  1.0 Fetch instruction each clock cycle Effectively process new instruction almost every cycle Although each individual instruction has latency of 5 cycles CPI > 1.0 Sometimes must stall or cancel branches Computing CPI C clock cycles I instructions executed to completion B bubbles injected (C = I + B) CPI = C/I = (I+B)/I = 1.0 + B/I Factor B/I represents average penalty due to bubbles

Computing CPI CPI Can reformulate to account for Function of useful instruction and bubbles Cb/Ci represents the pipeline penalty due to stalls Can reformulate to account for load penalties (lp) branch misprediction penalties (mp) return penalties (rp)

Computing CPI - II So how do we determine the penalties? Depends on how often each situation occurs on average How often does a load occur and how often does that load cause a stall? How often does a branch occur and how often is it mispredicted How often does a return occur? We can measure these simulator hardware performance counters We can estimate through historical averages Then use to make early design tradeoffs for architecture

Instruction Frequency Computing CPI - III Cause Name Instruction Frequency Condition Frequency Stalls Product Load/Use lp 0.30 0.3 1 0.09 Mispredict mp 0.20 0.4 2 0.16 Return rp 0.02 1.0 3 0.06 Total penalty 0.31 CPI = 1 + 0.31 = 1.31 == 31% worse than ideal This gets worse when: Account for non-ideal memory access latency Deeper pipelines (where stalls per hazard increase)

CPI for PIPE (Cont.) B/I = LP + MP + RP LP: Penalty due to load/use hazard stalling Fraction of instructions that are loads 0.25 Fraction of load instructions requiring stall 0.20 Number of bubbles injected each time 1  LP = 0.25 * 0.20 * 1 = 0.05 MP: Penalty due to mispredicted branches Fraction of instructions that are cond. jumps 0.20 Fraction of cond. jumps mispredicted 0.40 Number of bubbles injected each time 2  MP = 0.20 * 0.40 * 2 = 0.16 RP: Penalty due to ret instructions Fraction of instructions that are returns 0.02 Number of bubbles injected each time 3  RP = 0.02 * 3 = 0.06 Net effect of penalties 0.05 + 0.16 + 0.06 = 0.27  CPI = 1.27 (Not bad!) Typical Values

Summary Today Next Time Pipeline control logic Effect on CPI and performance Next Time Further mitigation of branch mispredictions State machine design