Branches Daniel Ángel Jiménez Departments of Computer Science UT San Antonio & Rutgers
2 About Me u Born in Fort Hood, Texas in 1969 (~80 miles north on IH-35) u Dad from Mexico, Mom from Texas u Lived in Temple, Texas u Moved to San Antonio, Texas in 1973 (~80 miles south on IH-35) u B.S. at UTSA, 1992 u M.S. at UTSA, 1994 u Moved to San Marcos, Texas in 1995 (~30 miles south on IH-35) u Started Ph.D. program at UT Austin u Moved back to San Antonio in 1996 u Non-tenure-track faculty, UTHSCSA u Moved to Austin in 1999 u Ph.D. UT Austin, 2002 u Moved to New Jersey in 2002, New York 2003 u Asst. Professor, Rutgers u Sabbatical in Barcelona, Spain in 2005 u Back to San Antonio in 2007 u Associate Professor, UTSA u Mostly for the breakfast tacos
3 More about me u Always liked computer programming u First computer was Tandy Color Computer in 1984 u Fortunate sequence of mentors guided me into my career u Mom – Education is important (didn’t believe her at the time) u Neal Wagner – theory is exciting u Hugh Maynard – math is my friend u Betty Travis – Research Careers for Minority Scholars u Calvin Lin – perfect fit Ph.D. advisor u Uli Kremer – welcomed me into being a professor u Like taekwondo, piano, traveling, Spanish music u Current favorite band – Ojos de Brujo
4 This Talk u How an instruction is processed – pipelining u Kinds of branches u Branch prediction u Accuracy u Technique u Empirical properties of branches u How to handle branches u Conclusion
5 How an Instruction is Processed Instruction fetch Instruction decode Execute Memory access Write back Processing can be divided into five stages:
6 Instruction-Level Parallelism Instruction fetch Instruction decode Execute Memory access Write back To speed up the process, pipelining overlaps execution of multiple instructions, exploiting parallelism between instructions
7 Control Hazards: Branches Conditional branches create a problem for pipelining: the next instruction can't be fetched until the branch has executed, several stages later. Branch instruction
8 Pipelining with Branches Instruction fetch Instruction decode Execute Memory access Write back Branches cause bubbles in the pipeline, where some stages are left idle. Unresolved branch instruction
9 Branch Prediction Instruction fetch Instruction decode Execute Memory access Write back A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path. Speculative execution Branch predictors must be highly accurate to avoid mispredictions!
10 Kinds of Branches u Conditional u Very common, 1/4 to 1/10 of instructions u Must be predicted, can be hard to predict u Loops back edges with short fixed trip counts can be predicted perfectly u Unconditional u Targets still have to be predicted with BTB u Indirect u E.g. jumping through a table of addresses u Can be predicted, often just use BTB as predictor u Returns u Predicted with RAS u >99% possible if you avoid deep recursion
11 Branch Predictor Accuracy is Critical u The cost of a misprediction is proportional to pipeline depth u Predictor accuracy is more important for deeper pipelines u Need good branch predictor to feed core with right-path insts Simulations with SimpleScalar/Alpha u Deeper pipelines allow higher clock rates by decreasing the delay of each pipeline stage u Decreasing misprediction rate from 9% to 4% results in 31% speedup for 32 stage pipeline u Today’s pipelines have been scaled back, but only temporarily…
12 Conditional Branch Prediction u Most predictors are based on 2- level adaptive branch prediction [Yeh & Patt ’91] u Branch outcomes are shifted into a history register, 1 for taken, 0 for not taken u History bits and address bits combine to index a pattern history table (PHT) of 2-bit saturating counters u Prediction is high bit of counter u Counter is incremented if branch is taken, decremented if branch is not taken GAs – a common type of predictor
13 Characteristics of Branch Behavior u Branches tend to be highly biased u 53% are strongly biased, taken at least 98% or at most 2% of the time u Remaining branches also exhibit weak biases u A few branches show no bias u Branch outcomes are highly correlated with past branch history
14 Important Facts about Branches u A taken branch is (often) more costly than an untaken branch u Trace caches can mitigate this u Mispredicted branches are very costly u Some mispredictions are more costly than others – how to exploit that? u Be aware of your machine’s indirect branch predictor What’s the best way to compile dense switch/case stmts? u What to do about virtual dispatch? u Some ISAs have hint bits u These can help a lot if set correctly u But only if microarch uses them
15 What to do about mispredictions? u Capacity/Conflict u Too many program paths, collisions in tables u Solutions: use the hint bits or align branches u Unfortunately branch predictors are secret so options are limited u Branches not correlated with recent history u Split loops so trip counts are within history length u Data dependent branches with unfriendly distributions u Predicate if possible u Profile u Performance counters + tools such as VTune or Oprofile
16 Conclusion u Branches can have variable costs due primarily to prediction u Be aware of the implementation of branches u Profiling and ISA support for branches u Different causes and effects of mispredictions u Impact of mispredictions has crept up in recent years
17 The End
18 Related Compiler Work u Profile-guided code placement to improve instruction locality u Program restructuring for virtual memory [Hatfield & Gerald `71] u Reducing conflict misses in direct-mapped I$ [McFarling `88, `89] u Procedure placement [Petis & Hansen `90], [Gloy & Smith `99] u Transformations for reducing branch costs u Branch alignment [Calder & Grunwald `94],[Young et al. `97] u Software trace cache [Ramirez et al. `99] u Transformations for improving predictor accuracy u Static correlated branch prediction [Young & Smith `99] u Address adjustment [Chen & King `99] u Reverse-engineering branch predictors [Milenkovic et al. `04] u PHT partitioning [Jiménez `05]