Ghent University Veerle Desmet Lieven Eeckhout Koen De Bosschere Using Decision Trees to Improve Program-Based and Profile-Based Static Branch Prediction.

Slides:



Advertisements
Similar presentations
Static Single-Assignment ? ? Introduction: Over last few years [1991] SSA has been Stablished as… Intermediate program representation.
Advertisements

Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
1 ECE369 ECE369 Pipelining. 2 ECE369 “toupper” :converts any lowercase characters (with ASCII codes between 97 and 122) in the null-terminated argument.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /15/2013 Lecture 11: MIPS-Conditional Instructions Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
Some Properties of SSA Mooly Sagiv. Outline Why is it called Static Single Assignment form What does it buy us? How much does it cost us? Open questions.
The University of Adelaide, School of Computer Science
Apr. 12, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 6: Branching and Procedures in MIPS* Jeremy R. Johnson Wed. Apr. 12, 2000.
Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University
8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.
Compiler Optimization of scalar and memory resident values between speculative threads. Antonia Zhai et. al.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
PSUCS322 HM 1 Languages and Compiler Design II Basic Blocks Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring.
ECE1724F Compiler Primer Sept. 18, 2002.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
Scheduling Reusable Instructions for Power Reduction J.S. Hu, N. Vijaykrishnan, S. Kim, M. Kandemir, and M.J. Irwin Proceedings of the Design, Automation.
Performance Driven Crosstalk Elimination at Compiler Level TingTing Hwang Department of Computer Science Tsing Hua University, Taiwan.
Code Coverage Testing Using Hardware Performance Monitoring Support Alex Shye, Matthew Iyer, Vijay Janapa Reddi and Daniel A. Connors University of Colorado.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.
Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari
11/02/2009CA&O Lecture 03 by Engr. Umbreen Sabir Computer Architecture & Organization Instructions: Language of Computer Engr. Umbreen Sabir Computer Engineering.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware
Using Dynamic Binary Translation to Fuse Dependent Instructions Shiliang Hu & James E. Smith.
Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Chapter 2 — Instructions: Language of the Computer — 1 Conditional Operations Branch to a labeled instruction if a condition is true – Otherwise, continue.
Correct Alignment of a RAS after Call and Return Mispredictions Ghent University Veerle Desmet Yiannakis Sazeides Constantinos Kourouyiannis Koen De Bosschere.
Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002,
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 11 Conditional Operations.
Saisanthosh Balakrishnan Guri Sohi University of Wisconsin-Madison Program Demultiplexing: Data-flow based Speculative Parallelization.
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
Protecting C and C++ programs from current and future code injection attacks Yves Younan, Wouter Joosen and Frank Piessens DistriNet Department of Computer.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Computer Architecture & Operations I
Exploiting Postdominance for Speculative Parallelization
Control Flow Testing Handouts
Multiscalar Processors
Lecture 4: MIPS Instruction Set
Outline of the Chapter Basic Idea Outline of Control Flow Testing
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
Samira Khan University of Virginia Nov 13, 2017
Antonia Zhai, Christopher B. Colohan,
CSL718 : VLIW - Software Driven ILP
The University of Adelaide, School of Computer Science
Instructions - Type and Format
Program Slicing Baishakhi Ray University of Virginia
Instruction Scheduling Hal Perkins Winter 2008
Chapter 7 LC-2 Assembly Language.
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
pipelining: static branch prediction Prof. Eric Rotenberg
Adapted from the slides of Prof
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Loop-Level Parallelism
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Instruction Scheduling Hal Perkins Autumn 2011
Presentation transcript:

Ghent University Veerle Desmet Lieven Eeckhout Koen De Bosschere Using Decision Trees to Improve Program-Based and Profile-Based Static Branch Prediction

VEERLE DESMET – Ghent University2 if (input >= 0) { for (i=0; i<10; i++) { /* a loop :) */ } else printf(“input < 0”); compiler architecture loop i<10 loopheader end if (input ≥ 0) printf() Control flow High-level program code

VEERLE DESMET – Ghent University3 Static  fixed prediction per static branch  predicts most frequently direction  backward branches typically taken  e.g 1 misprediction for 10 predictions Applications  compiler optimizations  embedded systems  for guiding dynamic branch prediction loop i<10 loopheader end if (input ≥ 0) printf() Branch Prediction Least likely path Most likely path loop i<10 loopheader end if (input ≥ 0) printf() taken not-taken

VEERLE DESMET – Ghent University4 Outline Introduction Decision Trees  Extended feature set Heuristics  Ordering  Two new heuristics Results Conclusion

VEERLE DESMET – Ghent University5 Decision trees C4.5  Set examples  Choose best split (information gainratio)  Divide examples in subtrees  Repeat for subtrees Classification model forward branch direction branch type backward takennot-taken taken beq bgt bne features

VEERLE DESMET – Ghent University6 Features Literature [Calder et al.]  branch direction, branch type, successor basic block is a loopheader, postdominance relations,... New candidate features  number of instructions in successor basic block  dependency distance  number of incoming edges  … Branch Type Branch Direction Branch Operand Opcode Branch Operand Function Branch Operand Type RA Opcode RA Function RA Type RB Opcode RB Function RB Type Procedure Type Loop Header Branch Dominates Branch Postdominates Successor Ends Successor Loop Successor Back Edge Successor Exit Edge Successor UseDef Successor Call Language Register Looplevel Basic Block Size Dependency Distance RA Register RA Distance RB Register RB Distance Successor Looplevel Successor Basic Block Size Incoming Edges Successor Incoming Edges Successor Loop Sequence Features 2.8%

VEERLE DESMET – Ghent University7 Fair evaluation Programs C4.5 Prediction model Evaluation SPECint2000 SPECint95 forward branch direction branch type backward takennot-taken taken beq bgt bne

VEERLE DESMET – Ghent University8 Outline Introduction Decision Trees  Extended feature set Heuristics  Automatic Ordering  Two new heuristics Results Conclusion

VEERLE DESMET – Ghent University9 Loops  for, while,...  11% static  35% dynamic Loops are repeated  81% correct  88% static upper limit loop i<10 loopheader end if (input ≥ 0) printf() Heuristic:

VEERLE DESMET – Ghent University10 Opcode  Negative numbers denote error values Pointer  Pointer ≠ NULL  Pointers differ Loopheader  Loops are entered... loop i<10 loopheader end if (input ≥ 0) printf() Non-loops Heuristics:

VEERLE DESMET – Ghent University11 Heuristic ordering Loop → pointer → call → opcode → return → store → loopheader → guard + random [Ball & Larus] Optimal ordering out of 8! possibilities Dependent on program set Automated way loop i<10 loopheader end if (input ≥ 0) printf() Priority:

VEERLE DESMET – Ghent University12 Optimal tree loop opcode call return loopheader store pointer heuristic not-taken taken not-takentaken SPECint2000 SPECint95 ordering

VEERLE DESMET – Ghent University13 New heuristic 1 If one successor is postdominator Predict the non- postdominating successor  if -block without else -block: if -block will be executed if -block postdominating block if (input ≥ 0) Postdominating heuristic NEW

VEERLE DESMET – Ghent University14 New heuristic 2 “number of instructions between branch and its register defining instruction” ld r3, (r1) beq r2... cmplt r1, 10, r2 add r3, r4, r5 bne r2 distance 2undefined distance NEW Dependency distance heuristic

VEERLE DESMET – Ghent University15 Final ordering loop opcode call return loopheader store postdom heuristic not-taken taken not-taken SPECint2000 SPECint95 taken distance not-taken <3undefined≥3 noyes

VEERLE DESMET – Ghent University16 IPM Instructions Per Mispredicted branch “ how many instructions, including correctly predicted branches, one passes on average before encountering a mispredicted branch” [Fisher & Freudenberger] HIGHER-IS-BETTER

VEERLE DESMET – Ghent University17 Results gzip vpr gcc mcf crafty parser perlgap vortex bzip2 twolf compress go ijpeg li m88ksim average IPM Ball & Larus order: 31.3 IPM 18.5% Higher-is-better + postdominating heuristic: 34.7 IPM + dependency distance heuristic: 37.1 IPM Decision Tree order: 32.1 IPM

VEERLE DESMET – Ghent University18 Conclusion This presentation:  Set extra features provides 2.8% gain in IPM  Decision trees for heuristic ordering automatic extracting new heuristics 18.5% gain in IPM In paper also:  Coverage for heuristics  Profile-based branch prediction up to 11% gain in IPM

Ghent University Paper: proceedings pp. 336—352 Presentation: VEERLE DESMET – Ghent University

Ghent University Veerle Desmet Lieven Eeckhout Koen De Bosschere Using Decision Trees to Improve Program-Based and Profile-Based Static Branch Prediction