Systems I Pipelining III

Slides:

Advertisements

Similar presentations

Randal E. Bryant Carnegie Mellon University CS:APP CS:APP Chapter 4 Computer Architecture PipelinedImplementation Part II CS:APP Chapter 4 Computer Architecture.

Advertisements

University of Amsterdam Computer Systems – the processor architecture Arnoud Visser 1 Computer Systems The processor architecture.

1 Seoul National University Wrap-Up. 2 Overview Seoul National University Wrap-Up of PIPE Design  Exception conditions  Performance analysis Modern.

Real-World Pipelines: Car Washes Idea  Divide process into independent stages  Move objects through stages in sequence  At any instant, multiple objects.

PipelinedImplementation Part I PipelinedImplementation.

Instructor: Erol Sahin

– 1 – Chapter 4 Processor Architecture Pipelined Implementation Chapter 4 Processor Architecture Pipelined Implementation Instructor: Dr. Hyunyoung Lee.

Pipeline Enhancements for the Y86 Architecture

Randal E. Bryant Carnegie Mellon University CS:APP2e CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.

PipelinedImplementation Part I CSC 333. – 2 – Overview General Principles of Pipelining Goal Difficulties Creating a Pipelined Y86 Processor Rearranging.

Wrap-Up CSC 333. – 2 – Overview Wrap-Up of PIPE Design Performance analysis Fetch stage design Exceptional conditions Modern High-Performance Processors.

Randal E. Bryant CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture SequentialImplementation Slides.

Datapath Design II Topics Control flow instructions Hardware for sequential machine (SEQ) Systems I.

Pipelining III Topics Hazard mitigation through pipeline forwarding Hardware support for forwarding Forwarding to mitigate control (branch) hazards Systems.

David O’Hallaron Carnegie Mellon University Processor Architecture PIPE: Pipelined Implementation Part I Processor Architecture PIPE: Pipelined Implementation.

Data Hazard Solution 2: Data Forwarding Our naïve pipeline would experience many data stalls  Register isn’t written until completion of write-back stage.

1 Seoul National University Pipelined Implementation : Part I.

1 Naïve Pipelined Implementation. 2 Outline General Principles of Pipelining –Goal –Difficulties Naïve PIPE Implementation Suggested Reading 4.4, 4.5.

Randal E. Bryant Carnegie Mellon University CS:APP CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.

Randal E. Bryant adapted by Jason Fritts CS:APP2e CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.

Datapath Design I Topics Sequential instruction execution cycle Instruction mapping to hardware Instruction decoding Systems I.

Computer Architecture: Wrap-up CENG331 - Computer Organization Instructors: Murat Manguoglu(Section 1) Erol Sahin (Section 2 & 3) Adapted from slides of.

1 Sequential CPU Implementation. 2 Outline Logic design Organizing Processing into Stages SEQ timing Suggested Reading 4.2,4.3.1 ~

Pipeline Architecture I Slides from: Bryant & O’ Hallaron

PipelinedImplementation Part II PipelinedImplementation.

Computer Architecture: Pipelined Implementation - I

Computer Architecture adapted by Jason Fritts

Pipelining IV Topics Implementing pipeline control Pipelining and performance analysis Systems I.

1 SEQ CPU Implementation. 2 Outline SEQ Implementation Suggested Reading 4.3.1,

Randal E. Bryant Carnegie Mellon University CS:APP2e CS:APP Chapter 4 Computer Architecture PipelinedImplementation Part II CS:APP Chapter 4 Computer Architecture.

Sequential Hardware “God created the integers, all else is the work of man” Leopold Kronecker (He believed in the reduction of all mathematics to arguments.

Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.

1 Pipelined Implementation. 2 Outline Handle Control Hazard Handle Exception Performance Analysis Suggested Reading 4.5.

Sequential CPU Implementation Implementation. – 2 – Processor Suggested Reading - Chap 4.3.

1 Seoul National University Sequential Implementation.

CPSC 121: Models of Computation

Real-World Pipelines Idea Divide process into independent stages

CPSC 121: Models of Computation

Lecture 13 Y86-64: SEQ – sequential implementation

Systems I Pipelining IV

Lecture 14 Y86-64: PIPE – pipelined implementation

Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards

Sequential Implementation

Administrivia Midterm to be posted on Tuesday after class

Computer Architecture adapted by Jason Fritts then by David Ferry

asum.ys A Y86 Programming Example

Y86 Processor State Program Registers

Pipelined Implementation : Part I

Seoul National University

Seoul National University

Instruction Decoding Optional icode ifun valC Instruction Format

Pipelined Implementation : Part II

Computer Architecture adapted by Jason Fritts

Systems I Pipelining II

Pipelined Implementation : Part I

Pipeline control unit (highly abstracted)

Seoul National University

Pipeline Architecture I Slides from: Bryant & O’ Hallaron

Pipelined Implementation : Part I

Pipelined Implementation

Pipelined Implementation

Pipeline control unit (highly abstracted)

Pipeline Control unit (highly abstracted)

Systems I Pipelining II

Chapter 4 Processor Architecture

Systems I Pipelining II

Pipelined Implementation

Real-World Pipelines: Car Washes

Sequential CPU Implementation

Sequential Design תרגול 10.

Presentation transcript:

Systems I Pipelining III Topics Hazard mitigation through pipeline forwarding Hardware support for forwarding Forwarding to mitigate control (branch) hazards

How do we fix the Pipeline? Pad the program with NOPs Yuck! Stall the pipeline Data hazards Wait for producing instruction to complete Then proceed with consuming instruction Control hazards Wait until new PC has been determined Then begin fetching How is this better than putting NOPs into the program? Forward data within the pipeline Grab the result from somewhere in the pipe After it has been computed But before it has been written back This gives an opportunity to avoid performance degradation due to hazards!

Data Forwarding Naïve Pipeline Observation Trick Register isn’t written until completion of write-back stage Source operands read from register file in decode stage Needs to be in register file at start of stage Observation Value generated in execute or memory stage Trick Pass value directly from generating instruction to decode stage Needs to be available at end of decode stage

Data Forwarding Example irmovl $10,% edx 1 2 3 4 5 6 7 8 9 F D E M W 0x006: $3,% eax 0x00c: nop 0x00d: 0x00e: addl % ,% 0x010: halt 10 # demo-h2.ys Cycle 6 R[ ] f valA = valB W_ valE • dstE = 3 srcA srcB irmovl in write-back stage Destination value in W pipeline register Forward as valB for decode stage

Bypass Paths Decode Stage Forwarding Sources Forwarding logic selects valA and valB Normally from register file Forwarding: get valA or valB from later pipeline stage Forwarding Sources Execute: valE Memory: valE, valM Write back: valE, valM

Data Forwarding Example #2 Register %edx Generated by ALU during previous cycle Forward from memory as valA Register %eax Value just generated by ALU Forward from execute as valB

Forwarding Priority Multiple Forwarding Choices 0x000: irmovl $1, %eax 1 2 3 4 5 6 7 8 9 F D E M W 0x006: irmovl $2, %eax 0x00c: irmovl $3, %eax 0x012: rrmovl %eax, %edx 0x014: halt 10 # demo-priority.ys W R[ % eax ] f 3 1 D valA edx = 10 valB ? Cycle 5 M 2 E Multiple Forwarding Choices Which one should have priority Match serial semantics Use matching value from earliest pipeline stage

Implementing Forwarding Add additional feedback paths from E, M, and W pipeline registers into decode stage Create logic blocks to select from multiple sources for valA and valB in decode stage

Implementing Forwarding ## What should be the A value? int new_E_valA = [ # Use incremented PC D_icode in { ICALL, IJXX } : D_valP; # Forward valE from execute d_srcA == E_dstE : e_valE; # Forward valM from memory d_srcA == M_dstM : m_valM; # Forward valE from memory d_srcA == M_dstE : M_valE; # Forward valM from write back d_srcA == W_dstM : W_valM; # Forward valE from write back d_srcA == W_dstE : W_valE; # Use value read from register file 1 : d_rvalA; ];

Limitation of Forwarding Load-use dependency Value needed by end of decode stage in cycle 7 Value read from memory in memory stage of cycle 8

Avoiding Load/Use Hazard Stall using instruction for one cycle Can then pick up loaded value by forwarding from memory stage

Detecting Load/Use Hazard Condition Trigger Load/Use Hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }

Control for Load/Use Hazard 0x000: irmovl $128,% edx 1 2 3 4 5 6 7 8 9 F D E M W 0x006: $3,% ecx 0x00c: rmmovl % , 0(% ) 0x012: $10,% ebx 0x018: mrmovl 0(% ), eax # Load % # demo - luh . ys 0x01e: addl , # Use % 0x020: halt 10 11 bubble 12 Stall instructions in fetch and decode stages Inject bubble into execute stage Condition F D E M W Load/Use Hazard stall bubble normal

Branch Misprediction Example demo-j.ys 0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $2, %edx # Target (Should not execute) 0x017: irmovl $3, %ecx # Should not execute 0x01d: irmovl $4, %edx # Should not execute Should only execute first 7 instructions

Handling Misprediction Predict branch as taken Fetch 2 instructions at target Cancel when mispredicted Detect branch not-taken in execute stage On following cycle, replace instructions in execute and decode by bubbles No side effects have occurred yet

Detecting Mispredicted Branch Condition Trigger Mispredicted Branch E_icode = IJXX & !e_Cnd

Control for Misprediction Condition F D E M W Mispredicted Branch normal bubble

Return Example Previously executed three additional instructions demo-retb.ys 0x000: irmovl Stack,%esp # Initialize stack pointer 0x006: call p # Procedure call 0x00b: irmovl $5,%esi # Return point 0x011: halt 0x020: .pos 0x20 0x020: p: irmovl $-1,%edi # procedure 0x026: ret 0x027: irmovl $1,%eax # Should not be executed 0x02d: irmovl $2,%ecx # Should not be executed 0x033: irmovl $3,%edx # Should not be executed 0x039: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer Previously executed three additional instructions

Correct Return Example # demo - retb 0x026: ret F D E M W bubble F D E M W bubble F D E M W bubble F D E M W 0x00b: irmovl $5,% esi # Return F F D D E E M M W W As ret passes through pipeline, stall at fetch stage While in decode, execute, and memory stage Inject bubble into decode stage Release stall when reach write-back stage W valM = 0x0b • F F valC valC f f 5 5 rB rB f f % % esi esi

Detecting Return Condition Trigger Processing ret IRET in { D_icode, E_icode, M_icode }

Control for Return Condition F D E M W Processing ret stall bubble # demo - retb 0x026: ret F D E M W bubble F D E M W bubble F D E M W bubble F D E M W 0x00b: irmovl $5,% esi # Return F F D D E E M M W W Condition F D E M W Processing ret stall bubble normal

Special Control Cases Detection Action (on next cycle) Condition Trigger Processing ret IRET in { D_icode, E_icode, M_icode } Load/Use Hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } Mispredicted Branch E_icode = IJXX & !e_Bch Condition F D E M W Processing ret stall bubble normal Load/Use Hazard Mispredicted Branch

Summary Today Next Time Hazard mitigation through pipeline forwarding Hardware support for forwarding Forwarding to mitigate control (branch) hazards Next Time Implementing pipeline control Pipelining and performance analysis