4th JILP Workshop on Computer Architecture Competitions Championship Branch Prediction (CBP-4) -Moinuddin Qureshi (GT)

Slides:



Advertisements
Similar presentations
H-Pattern: A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation Samir Otiv Second Year Undergraduate Kaushik Garikipati Second.
Advertisements

Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.
Exploring Correlation for Indirect Branch Prediction 1 Nikunj Bhansali, Chintan Panirwala, Huiyang Zhou Department of Electrical and Computer Engineering.
Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University
8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.
TAGE-SC-L Branch Predictors
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
A Cache-Like Memory Organization for 3D memory systems CAMEO 12/15/2014 MICRO Cambridge, UK Chiachen Chou, Georgia Tech Aamer Jaleel, Intel Moinuddin K.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
AsiaCrypt Program Committee Report Chi Sung Laih Nov.30~Dec.4,2003 Taipei, Taiwan.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlated Branches from a Large Global History Renjiu Thomas, Manoij Franklin,
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
CS 7810 Lecture 9 Effective Hardware-Based Data Prefetching for High-Performance Processors T-F. Chen and J-L. Baer IEEE Transactions on Computers, 44(5)
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.
Analysis of Branch Predictors
Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.
Trace cache and Back-end Oper. CSE 4711 Instruction Fetch Unit Using I-cache I-cache I-TLB Decoder Branch Pred Register renaming Execution units.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
The Journal of Instruction Level Parallelism Championship Branch Prediction website: Dan Connors, Univ. of Colorado Tom Conte,
Not- Taken? Taken? The Frankenpredictor Gabriel H. Loh Georgia Tech College of Computing MICRO Dec 5, 2004.
Predicate Execution 2008/01/10 Presented by Jinho.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
T-BAG: Bootstrap Aggregating the TAGE Predictor Ibrahim Burak Karsli, Resit Sendag University of Rhode Island.
André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.
Temporal Stream Branch Predictor (TS Predictor) Yongming Shen, Michael Ferdman.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
4th JILP Workshop on Computer Architecture Competitions Championship Branch Prediction (CBP-4) -Moinuddin Qureshi (GT)
JILP RESULTS 1. JILP Experimental Framework Goal Simplicity of a trace based simulator Flexibility to model special predictors ( e.g., using data values)
Confessions of a Performance Monitor Hardware Designer Workshop on Hardware Performance Monitor Design HPCA February 2005 Jim Callister Intel Corporation.
Value Prediction Kyaw Kyaw, Min Pan Final Project.
Multiperspective Perceptron Predictor Daniel A. Jiménez Department of Computer Science & Engineering Texas A&M University.
Welcome to the 2016 Championship Branch Prediction Contest Program Trevor Mudge Seoul, South Korea June 18 th, 2016.
Lecture: Out-of-order Processors
The 2nd Cache Replacement Championship (CRC-2)
Multiperspective Perceptron Predictor with TAGE
CS 704 Advanced Computer Architecture
Logistic Regression and Perceptron Prediction of Instruction Branches
2nd Data Prefetching Championship Results and Awards
FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.
CMSC 611: Advanced Computer Architecture
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Welcome to the 1st Championship Value Prediction (CVP) Workshop
Sponsored by JILP and Intel’s Academic Research Office
Dynamic Branch Prediction
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Lecture 10: Branch Prediction and Instruction Delivery
TAGE-SC-L Again MTAGE-SC
5th JILP Workshop on Computer Architecture Competitions
Instruction Set Principles
Adapted from the slides of Prof
Wackiness Algorithm A: Algorithm B:
Chris Wilkerson, MRL/MTL Intel
Presentation transcript:

4th JILP Workshop on Computer Architecture Competitions Championship Branch Prediction (CBP-4) -Moinuddin Qureshi (GT)

Why Another CBP? Branch prediction remains an important problem for architecting high performance processors One of the few optimizations that can: 1.Improve single-threaded performance 2.Improve energy efficiency 3.Be implemented in a localized manner (small change) Previous CBP happened in 2011, time to rescan for new ideas

Thanks Organizing Committee Moin Qureshi, GT (chair) Alaa Alameldeen, Intel Chris Wilkerson, Intel Aamer Jaleel, Intel Program Committee Moin Qureshi, GT (chair) Trey Cain, Qualcomm Hyesoon Kim, GT Gabe Loh, AMD Pierre Michaud, INRIA Jared Stark, Intel Special thanks to: Aseem Grover (GT,Apple) for handling submission/evaluations +Org Committee

Format of CBP – Three tracks A 4KB track (for small systems) A 32KB track (for large systems) Unlimited track (let’s get the limit) – Workloads: 40 traces 20 short (30 mln) from CBP-1 [INT, FP, SRV, MM] 20 long (150 mln) from SPEC2006 -Figure of merit: Mispredictions per 1000 insts (MPKI) Arithmetic mean of MPKI over all 40 traces

Submissions and Acceptance Total 12 papers submitted – 6 for the 4KB category – 6 for the 32KB category – 11 for unlimited track Total 10 papers accepted – 5 for the 4KB category – 6 for the 32KB category – 10 for unlimited track Countries represented: USA, Canada, France, Japan, India

Process Requirements: 1.Code and paper should be readable 2.Design must not violate causality (cannot use future information to predict the current branch). Reviews: -2 to 3 reviews per paper - Offline program committee meeting Papers selected primarily on MPKI (and/or new ideas) Note: Authors of accepted papers were allowed to modify their design till camera ready deadline We will use only the revised code for ranking

Fixed Traces  Address Memoization Using same set of traces for both submission and evaluation can get misused easily E.g. How to get MPKI=0 for unlimited sized predictors PC Address in TraceRecord Outcome History, Store in Table 0xDEADBEEF1,0,1,1,0,0,1,0,1,0 …. 0xFADEDACE1,1,1,1,0,0,0,1,0,1 …. 0xCAFEBABE1,0,1,0,1,0,1,1,0,0 …. A table stores the outcome of each PC during design time Prediction: access this table & keep track of access counts To keep the contest meaningful, our evaluation infrastructure must be robust against such address memoization

Address Space Shifting to Avoid Memoization Shift the address space by constant “Base”  Changes all PC Program Our evaluated MPKI may be (is) different from the author’s So, stay tuned till the end to know the winner(s) PC’ = PC + Base Target’ = Target + Base We use address space shifting for all tracks Virtual Address Space

Let the Championship Begin …