BRANCH PREDICTION FOR THE OR1200 PIPELINE Alec Roelke.

Slides:



Advertisements
Similar presentations
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Advertisements

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Pipelining and Control Hazards Oct
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Goal: Reduce the Penalty of Control Hazards
COMP381 by M. Hamdi 1 Pipelining Control Hazards and Deeper pipelines.
COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Abstraction Question General purpose processors have an abstraction layer fixed at the ISA and have little control over the compilers or code run on the.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CMPE 421 Parallel Computer Architecture
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Branch Hazards and Static Branch Prediction Techniques
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CSCE 212 Chapter 6 Enhancing Performance with Pipelining Instructor: Jason D. Bakos.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
PipeliningPipelining Computer Architecture (Fall 2006)
Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.
Chapter Six.
CS2100 Computer Organization
Instruction Level Parallelism
Pipelining Chapter 6.
CSCI206 - Computer Organization & Programming
Morgan Kaufmann Publishers
Samira Khan University of Virginia Nov 13, 2017
Instructor: Justin Hsia
Pipeline Implementation (4.6)
Chapter 4 The Processor Part 4
ECS 154B Computer Architecture II Spring 2009
Chapter 4 The Processor Part 3
Morgan Kaufmann Publishers The Processor
CS 5513 Computer Architecture Pipelining Examples
Pipelining review.
Pipelining Chapter 6.
Instruction Level Parallelism and Superscalar Processors
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Pipelining in more detail
CSCI206 - Computer Organization & Programming
Data Hazards Data Hazard
Chapter Six.
The Processor Lecture 3.6: Control Hazards
Chapter Six.
Control unit extension for data hazards
November 5 No exam results today. 9 Classes to go!
CSC3050 – Computer Architecture
Pipelining (II).
Control unit extension for data hazards
CSC3050 – Computer Architecture
Wackiness Algorithm A: Algorithm B:
Control unit extension for data hazards
Systems Architecture II
Instruction Level Parallelism
CS 3853 Computer Architecture Pipelining Examples
Presentation transcript:

BRANCH PREDICTION FOR THE OR1200 PIPELINE Alec Roelke

OUTLINE OR1200 pipeline overview Motivation for branch prediction How to handle branches in pipelines Stall Add delay slots Predict outcomes Implementation of branch prediction Potiential improvement Synopsys synthesis results Design Compiler IC Compiler Conclusions and future work 2

OR2100 PIPELINE OVERVIEW Five stages In-order Single-issue ALU for Boolean logic, comparison, bit manipulation MAC for integer arithmetic Multiply/divide Add/subtract Optional support for floating point arithmetic 3 Image from

MOTIVATION FOR BRANCH PREDICTION Some programs have branch statements Function call, if, for, while, etc. Sometimes branches are conditional Typically, ALU is needed for calculating condition No problem in a single-cycle machine What to do for a pipelined machine? 4 i = 0 i < N i++ TRUE Loop Code Post-Loop Code FALSE

STALLING Wait until EX for branch resolution Simplest solution Increases CPI 5 IFID EXMEMWB BNE … … … … … … … … 1 NOP BNE … … … … … … 2 NOP BNE … … … … 3 T T NOP BNE … … 4 … … T T NOP BNE 5

DELAY SLOT Instruction(s) after conditional branch Always executed regardless of branch outcome Smallest CPI Confusing to program for OR1200 has one delay slot 6 IFID EXMEMWB BNE … … … … … … … … 1 DSLOT BNE … … … … … … 2 DSLOT BNE … … … … 3 T T DSLOT BNE … … 4 … … T T DSLOT BNE 5

BRANCH PREDICTION When a branch is fetched, predict its outcome If prediction is wrong, flush instructions Worst-case CPI = stall Best-case CPI = delay slots Many prediction schemes A good predictor will have close to minimal CPI 7 IFID EXMEMWB BNE … … … … … … … … … … … … … … … … … … 3 T T NOP BNE … … 4 … … T T NOP BNE 5

STATIC VS. DYNAMIC Static Branch Prediction Always predict the same value OR1200 always predicts not-taken With one delay slot When branch is taken, one instruction is flushed Dynamic Branch Prediction Remember past predictions Base current prediction on history 8 Not Taken Taken Branch was Taken Branch wasn’t Taken Branch Prediction

BRANCH PREDICTION IMPLEMENTATION Static branch predictor Because of delay slot, not used until branch is already in decode Compare target address to instruction address If smaller (backward branch), take branch If larger (forward branch), don’t take branch Minimal changes to existing modules required Delay slot is preserved if prediction is incorrect to maintain backwards compatibility 9

THEORETICAL PERFORMANCE 10

DESIGN COMPILER Metric Clock Frequency (GHz)0.1 Timing Slack (ns)

IC COMPILER 12

IC COMPILER LAYOUT 13

CONCLUSIONS AND FUTURE WORK Motivated the addition of branch prediction to OR1200 Implemented new static branch prediction scheme Compiled design in Synopsys Design Compiler Created layout in Synopsys IC Compiler Finish implementing dynamic branch predictor Size will increase greatly due to required memory elements Work out final errors in layout 14