Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University

Slides:

Advertisements

Similar presentations

Computer Organization and Architecture

Advertisements

Computer Architecture Instruction-Level Parallel Processors

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.

Pipelining and Control Hazards Oct

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.

8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.

CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

Computer Organization and Architecture

3.13. Fallacies and Pitfalls Fallacy: Processors with lower CPIs will always be faster Fallacy: Processors with faster clock rates will always be faster.

Computer Architecture Computer Architecture Processing of control transfer instructions, part II Ola Flygt Växjö University

W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.

EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

Computer Organization and Architecture The CPU Structure.

EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

Branch Prediction Dimitris Karteris Rafael Pasvantidιs.

COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.

1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Sep 21, 2005 Topic: Pipelining -- Intermediate Concepts (Control Hazards)

1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)

Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.

CH12 CPU Structure and Function

5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.

Computer Architecture Computer Architecture Superscalar Processors Ola Flygt Växjö University +46.

PROCESSING CONTROL TRANSFER INSTRUCTIONS Chapter No. 8.

1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.

CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S

Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.

Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.

Branch Hazards and Static Branch Prediction Techniques

Branch Prediction Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.

CSL718 : Pipelined Processors

Computer Architecture Chapter (14): Processor Structure and Function

Multiscalar Processors

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Samira Khan University of Virginia Nov 13, 2017

Morgan Kaufmann Publishers The Processor

Superscalar Processors & VLIW Processors

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

So far we have dealt with control hazards in instruction pipelines by:

Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.

So far we have dealt with control hazards in instruction pipelines by:

Lecture 10: Branch Prediction and Instruction Delivery

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

Dynamic Hardware Prediction

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

Control unit extension for data hazards

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

Presentation transcript:

Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University

Outline  8.1 Introduction  8.2 Basic approaches to branch handling  8.3 Delayed branching  8.4 Branch processing  Branch detection  Branch prediction schemes  Prediction accuracy CH01

8.1 Intro to Branch  Branches modify, conditionally or unconditionally, the value of the PC  To transfer control  To alter the sequence of instructions

Major types of branches

Branch: To transfer control

Branch examples

8.1.2 How to check the results of operations for specified conditions {branch} (e.g. equals 0, negative, and so on) ISA = Instruction Set Architecture

Alternatives for checking the operation results

Result State vs. Direct Check, e.g.

Result state approach: Disadvantage  The generation of the result state is not straightforward  It requires an irregular structure and occupies additional chip area  The result state is a sequential concept.  It cannot be applied without modification in architectures which have multiple execution units.

Retaining sequential consistency for condition checking (in VLIW or Superscalar)  Use multiple sets of condition codes or flags  It relies on programmer or compiler to use different sets condition codes or flags for different outcome generated by different EUs.  Use Direct Check approach.

Branch Statistics  20% of general-purpose code are branch  on average, each fifth instruction is a branch  5-10% of scientific code are branch  The Majority of branches are conditional (80%)  75-80% of all branches are taken

Branch statistics: Taken or not Taken

Frequency of taken and not-taken branches

8.1.4 The branch problem: The delay caused in pipelining

More branch problems  Conditional branch could cause an even longer penalty  evaluation of the specified condition needs an extra cycle  waiting for unresolved condition (the result is not yet ready)  e.g. wait for the result of FDIV may take cycles  Pipelines with more stages than 4  each branch would result in a yet larger number of wasted cycles (called bubbles)

8.1.5 Performance measures of branch processing Pt : branch penalties for taken Pnt : branch penalties for not-taken ft : frequencies of taken fnt : frequencies for not-taken P : effective penalty of branch processing  P = ft * Pt + fnt * Pnt  e.g :: P = 0.75 * * 2 = 6.5 cycles  e.g. i486:: P = 0.75 * * 0 = 1.5 cycles  Branch prediction correctly or mispredicted  P = fc * Pc + fm * Pm  e.g. Pentium:: P = 0.9 * * 3.5 = 0.35 cycles

Interpretation of the concept of branch penalty

Zero-cycle branching {in no time}

8.2 Basic approaches to branch handling

Review of the basic approaches to branch handling

Speculative vs. Multiway branching

-Delayed Branching: Occurrence of an unused instruction slot (unconditional branch)

Basic scheme of delayed branching

Delayed branching: Performance Gain  Ratio of the delay slots that can be filled with useful instructions: f f  60-70% of the delay slot can be filled with useful instruction  fill only with: instruction that can be put in the delay slot but does not violate data dependency  fill only with: instruction that can be executed in single pipeline cycle  Frequency of branches: f b  20-30% for general-propose program  5-10% for scientific program  100 instructions have 100* f b delay slots,  100*f b * f f can be utilized.  Performance Gain = (100*f b * f f )/100 = f b * f f

Delayed branching: for conditional branches {Can be cancel or not}

Where to find the instruction to fill delay slot

Possible annulment options provided by architectures (use special instructions) with delayed branching {Scalar only}

8.4 Branch Processing: design space

Branch detection schemes {early detection, better handling}

Branch detection in parallel with decoding/issuing of other instructions (in I-Buffer)

Early detection of branches by Looking ahead

Early detection of branches by inspection of instruction that inputs to I-buffer

Early branch detection: {for scalar Processor} Integrated instruction fetch and branch detection  Detect branch instruction during fetching  Guess taken or not taken  Fetch next sequential instruction or target instruction

Handling of unresolved conditional branches

Blocking branch processing  Simply stalled (stopped and waited) until the specified condition can be resolved

-Basic kinds of branch predictions

-The fixed prediction approach

Always not taken vs. Always Taken

Always not taken: Penalty figures

Penalty figures for the always taken prediction approach

-Static branch prediction

Static prediction: opcode based e.g. implemented in the MC88110

-Dynamic branch prediction: branch taken in the last n occurrences is likely to be taken next

Dynamic branch prediction:

1-bit dynamic prediction: state transition diagram

2-bit dynamic prediction: state transition diagram

3-bit prediction

Implicit dynamic technique  Schemes for accessing the branch target path also used for branch prediction  Branch Target Access Cache (BTAC)  holds the most recently used branch addresses  Branch Target Instruction Cache (BTIC)  holds the most recently used target instructions  BTAC or BTIC holds entries only for the taken branches  The existence of an entry means that  the corresponding branch was taken at its last occurrence  so its next occurrence is also guessed as taken

Implementation alternatives of history bits

Example of the implementation of the BHT

Combining implicit and 2bit prediction

Combining implicit and 2bit prediction..

The effect of branch accuracy on branch penalty

Simulation results of prediction accuracy on the SPEC