Predicate Execution 2008/01/10 Presented by Jinho.

Slides:

Advertisements

Similar presentations

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.

Advertisements

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

Dynamic Branch Prediction

THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.

Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University.

8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.

Lecture 8: More ILP stuff Professor Alvin R. Lebeck Computer Science 220 Fall 2001.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.

1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )

UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

Power Savings in Embedded Processors through Decode Filter Cache Weiyu Tang, Rajesh Gupta, Alex Nicolau.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.

EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.

1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )

1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )

1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )

CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.

Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari

Idempotent Processor Architecture Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group UW-Madison MICRO 2011, Porto Alegre.

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.

Diverge-Merge Processor (DMP) Hyesoon Kim José A. Joao Onur Mutlu* Yale N. Patt HPS Research Group *Microsoft Research University of Texas at Austin.

SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.

Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.

Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.

Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT.

2D-Profiling Detecting Input-Dependent Branches with a Single Input Data Set Hyesoon Kim M. Aater Suleman Onur Mutlu Yale N. Patt HPS Research Group The.

PART 5: (1/2) Processor Internals CHAPTER 14: INSTRUCTION-LEVEL PARALLELISM AND SUPERSCALAR PROCESSORS 1.

1 Lecture 12: Advanced Static ILP Topics: parallel loops, software speculation (Sections )

Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.

Prophet/Critic Hybrid Branch Prediction B B B

1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

BRANCH PREDICTION FOR THE OR1200 PIPELINE Alec Roelke.

Dynamic Branch Prediction

Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,

Instruction-Level Parallelism Dynamic Branch Prediction

Lecture: Out-of-order Processors

Computer Architecture: Branch Prediction (II) and Predicated Execution

Lecture: Out-of-order Processors

Lecture 6: Advanced Pipelines

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

IA-64 Microarchitecture --- Itanium Processor

CPE 631: Branch Prediction

EE 382N Guest Lecture Wish Branches

Yingmin Li Ting Yan Qi Zhao

Dynamic Branch Prediction

Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

Lecture 10: Branch Prediction and Instruction Delivery

Serene Banerjee, Lizy K. John, Brian L. Evans

Adapted from the slides of Prof

Loop-Level Parallelism

rePLay: A Hardware Framework for Dynamic Optimization

Gang Luo, Hongfei Guo {gangluo,

Project Guidelines Prof. Eric Rotenberg.

Lecture 7: Branch Prediction, Dynamic ILP

Presentation transcript:

Predicate Execution 2008/01/10 Presented by Jinho

If-conversion Performance degradation mov r1 = 0; if( r2 == r3 ) mov r1 = 1; else mov r1 = 2; mov r4 = r1; mov r1 = 0; cmp.eq p1,p2=r2,r3; (p1) mov r1 = 1; (p2) mov r1 = 2; mov r4 = r1; Problem Multiple definition Performance degradation due to renaming

Phi-Predication for Light-Weight If-Conversion Weihaw Chuang, Brad Calder, Jeanne Ferrante - CGO’03

Phi-Predication Instructions

Compiler Transformation 4 Classes

Predicate Predication for Efficient Out-of-order Execution Weihaw Chuang, Brad Calder - ICS’03

Main Idea Predicate Predictor ◦Predicates were branches before if-conversion ◦Value prediction instead of branch prediction Implementation ◦Separate from branch predictor  Only for branch history table  No needs for the return-address stack or the branch target buffer

Predicate Early Evaluation REN1 ◦Predicate prediction is completed REN2 ◦The predicted predicates and the true predicate values are early-evaluated

Predicate Misprediction Recovery Flush Predicate Misprediction ◦Naïve approach Rename-Replay for Predicate Misprediction ◦Instructions on false predicates are not put into the issue queue ◦Replay from predicate early evaluation ◦Instructions are stored in recovery queue(RecQ) Selective-Replay for Predicate Mispredictions ◦All instructions are put into the issue queue ◦Replay selectively

Pipeline comparison Flush vs. Rename-replay

Selective Replay Examples

Evaluation Methodology ◦Trace  David Mosberger’s “utrace.c” ◦Simulator  Modified SimpleScalar 3.0 to handle IA64 ◦Benchmarks  Spec2000 Integer and Floating-Point

Comparison Speedups

Wrap-up

Multiple definition Solution? mov r1 = 0; if( r2 == r3 ) mov r1 = 1; else mov r1 = 2; mov r4 = r1; mov r1 = 0; cmp.eq p1,p2=r2,r3; (p1) mov r1 = 1; (p2) mov r1 = 2; mov r4 = r1; mov r1 = 0; mov r5 = 2; cmp.eq p1,p2=r2,r3; phi r4 = (p1)1,r5; Phi-prediction Doesn’t need to rename Only for some operations Predicate prediction Generally better performance Problem in hard-to-predict branch