Branches Daniel Ángel Jiménez Departments of Computer Science UT San Antonio & Rutgers.

Slides:



Advertisements
Similar presentations
Branch prediction Titov Alexander MDSP November, 2009.
Advertisements

Final Project : Pipelined Microprocessor Joseph Kim.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Dynamic Branch Prediction
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Calvin Lin Dept. of Computer Science Rutgers University Univ. of Texas Austin Presented.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
Computer Organization and Architecture The CPU Structure.
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Branch Target Buffers BPB: Tag + Prediction
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Dynamic Branch Prediction
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
Spring 2003CSE P5481 Control Hazard Review The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction.
Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.
Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Abstraction Question General purpose processors have an abstraction layer fixed at the ISA and have little control over the compilers or code run on the.
What It Means To Get A Ph.D. Daniel Ángel Jiménez Department of Computer Science The University of Texas at San Antonio.
ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.
Page 1 Trace Caches Michele Co CS 451. Page 2 Motivation  High performance superscalar processors  High instruction throughput  Exploit ILP –Wider.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Branch Prediction Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Introduction to Computer Organization Pipelining.
Prophet/Critic Hybrid Branch Prediction B B B
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
CSL718 : Pipelined Processors
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Dynamic Branch Prediction
Computer Architecture Advanced Branch Prediction
COSC3330 Computer Architecture Lecture 15. Branch Prediction
Pipeline Implementation (4.6)
Morgan Kaufmann Publishers The Processor
CMSC 611: Advanced Computer Architecture
Module 3: Branch Prediction
So far we have dealt with control hazards in instruction pipelines by:
15-740/ Computer Architecture Lecture 24: Control Flow
Ka-Ming Keung Swamy D Ponpandi
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Hardware Prediction
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Wackiness Algorithm A: Algorithm B:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Ka-Ming Keung Swamy D Ponpandi
Presentation transcript:

Branches Daniel Ángel Jiménez Departments of Computer Science UT San Antonio & Rutgers

2 About Me u Born in Fort Hood, Texas in 1969 (~80 miles north on IH-35) u Dad from Mexico, Mom from Texas u Lived in Temple, Texas u Moved to San Antonio, Texas in 1973 (~80 miles south on IH-35) u B.S. at UTSA, 1992 u M.S. at UTSA, 1994 u Moved to San Marcos, Texas in 1995 (~30 miles south on IH-35) u Started Ph.D. program at UT Austin u Moved back to San Antonio in 1996 u Non-tenure-track faculty, UTHSCSA u Moved to Austin in 1999 u Ph.D. UT Austin, 2002 u Moved to New Jersey in 2002, New York 2003 u Asst. Professor, Rutgers u Sabbatical in Barcelona, Spain in 2005 u Back to San Antonio in 2007 u Associate Professor, UTSA u Mostly for the breakfast tacos

3 More about me u Always liked computer programming u First computer was Tandy Color Computer in 1984 u Fortunate sequence of mentors guided me into my career u Mom – Education is important (didn’t believe her at the time) u Neal Wagner – theory is exciting u Hugh Maynard – math is my friend u Betty Travis – Research Careers for Minority Scholars u Calvin Lin – perfect fit Ph.D. advisor u Uli Kremer – welcomed me into being a professor u Like taekwondo, piano, traveling, Spanish music u Current favorite band – Ojos de Brujo

4 This Talk u How an instruction is processed – pipelining u Kinds of branches u Branch prediction u Accuracy u Technique u Empirical properties of branches u How to handle branches u Conclusion

5 How an Instruction is Processed Instruction fetch Instruction decode Execute Memory access Write back Processing can be divided into five stages:

6 Instruction-Level Parallelism Instruction fetch Instruction decode Execute Memory access Write back To speed up the process, pipelining overlaps execution of multiple instructions, exploiting parallelism between instructions

7 Control Hazards: Branches Conditional branches create a problem for pipelining: the next instruction can't be fetched until the branch has executed, several stages later. Branch instruction

8 Pipelining with Branches Instruction fetch Instruction decode Execute Memory access Write back Branches cause bubbles in the pipeline, where some stages are left idle. Unresolved branch instruction

9 Branch Prediction Instruction fetch Instruction decode Execute Memory access Write back A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path. Speculative execution Branch predictors must be highly accurate to avoid mispredictions!

10 Kinds of Branches u Conditional u Very common, 1/4 to 1/10 of instructions u Must be predicted, can be hard to predict u Loops back edges with short fixed trip counts can be predicted perfectly u Unconditional u Targets still have to be predicted with BTB u Indirect u E.g. jumping through a table of addresses u Can be predicted, often just use BTB as predictor u Returns u Predicted with RAS u >99% possible if you avoid deep recursion

11 Branch Predictor Accuracy is Critical u The cost of a misprediction is proportional to pipeline depth u Predictor accuracy is more important for deeper pipelines u Need good branch predictor to feed core with right-path insts Simulations with SimpleScalar/Alpha u Deeper pipelines allow higher clock rates by decreasing the delay of each pipeline stage u Decreasing misprediction rate from 9% to 4% results in 31% speedup for 32 stage pipeline u Today’s pipelines have been scaled back, but only temporarily…

12 Conditional Branch Prediction u Most predictors are based on 2- level adaptive branch prediction [Yeh & Patt ’91] u Branch outcomes are shifted into a history register, 1 for taken, 0 for not taken u History bits and address bits combine to index a pattern history table (PHT) of 2-bit saturating counters u Prediction is high bit of counter u Counter is incremented if branch is taken, decremented if branch is not taken GAs – a common type of predictor

13 Characteristics of Branch Behavior u Branches tend to be highly biased u 53% are strongly biased, taken at least 98% or at most 2% of the time u Remaining branches also exhibit weak biases u A few branches show no bias u Branch outcomes are highly correlated with past branch history

14 Important Facts about Branches u A taken branch is (often) more costly than an untaken branch u Trace caches can mitigate this u Mispredicted branches are very costly u Some mispredictions are more costly than others – how to exploit that? u Be aware of your machine’s indirect branch predictor  What’s the best way to compile dense switch/case stmts? u What to do about virtual dispatch? u Some ISAs have hint bits u These can help a lot if set correctly u But only if microarch uses them

15 What to do about mispredictions? u Capacity/Conflict u Too many program paths, collisions in tables u Solutions: use the hint bits or align branches u Unfortunately branch predictors are secret so options are limited u Branches not correlated with recent history u Split loops so trip counts are within history length u Data dependent branches with unfriendly distributions u Predicate if possible u Profile u Performance counters + tools such as VTune or Oprofile

16 Conclusion u Branches can have variable costs due primarily to prediction u Be aware of the implementation of branches u Profiling and ISA support for branches u Different causes and effects of mispredictions u Impact of mispredictions has crept up in recent years

17 The End

18 Related Compiler Work u Profile-guided code placement to improve instruction locality u Program restructuring for virtual memory [Hatfield & Gerald `71] u Reducing conflict misses in direct-mapped I$ [McFarling `88, `89] u Procedure placement [Petis & Hansen `90], [Gloy & Smith `99] u Transformations for reducing branch costs u Branch alignment [Calder & Grunwald `94],[Young et al. `97] u Software trace cache [Ramirez et al. `99] u Transformations for improving predictor accuracy u Static correlated branch prediction [Young & Smith `99] u Address adjustment [Chen & King `99] u Reverse-engineering branch predictors [Milenkovic et al. `04] u PHT partitioning [Jiménez `05]