Predication ECE 721 Prof. Rotenberg.

Slides:



Advertisements
Similar presentations
Computer Architecture Instruction-Level Parallel Processors
Advertisements

1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Compiler techniques for exposing ILP
Lec ECE 463/521, Profs. Conte, Rotenberg and Gehringer, Dept. of ECE, NC State University Static Scheduling Techniques m Local scheduling (within.
Loop Unrolling & Predication CSE 820. Michigan State University Computer Science and Engineering Software Pipelining With software pipelining a reorganized.
CPE 631: ILP, Static Exploitation Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.
1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
ELEN 468 Advanced Logic Design
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
1 COSC 3P92 Cosc 3P92 Week 8 Lecture slides It is dangerous to be right when the government is wrong. Voltaire ( )
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
Branch Hazards and Static Branch Prediction Techniques
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Use of Pipelining to Achieve CPI < 1
CS 352H: Computer Systems Architecture
Electrical and Computer Engineering University of Cyprus
Computer Architecture: Branch Prediction (II) and Predicated Execution
William Stallings Computer Organization and Architecture 8th Edition
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
Simultaneous Multithreading
Morgan Kaufmann Publishers
Lecture 4: MIPS Instruction Set
ELEN 468 Advanced Logic Design
Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Samira Khan University of Virginia Nov 13, 2017
Improving Program Efficiency by Packing Instructions Into Registers
CDA 3101 Spring 2016 Introduction to Computer Organization
Henk Corporaal TUEindhoven 2009
Super Quick Architecture Review
Instruction Level Parallelism and Superscalar Processors
Instruction Scheduling for Instruction-Level Parallelism
The EPIC-VLIW Approach
Lecture 4: MIPS Instruction Set
CSCI206 - Computer Organization & Programming
CPE 631: Branch Prediction
EE 382N Guest Lecture Wish Branches
CS 704 Advanced Computer Architecture
Yingmin Li Ting Yan Qi Zhao
The University of Adelaide, School of Computer Science
Topic 5: Processor Architecture Implementation Methodology
Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Henk Corporaal TUEindhoven 2011
Topic 5: Processor Architecture
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Additional ILP topic #5: VLIW Also: ISA topics Prof. Eric Rotenberg
VLIW direct descendant of horizontal microprogramming
HARP Control Divergence & Assignment 4
What is Computer Architecture?
pipelining: static branch prediction Prof. Eric Rotenberg
What is Computer Architecture?
CSC3050 – Computer Architecture
Dynamic Hardware Prediction
Created by Vivi Sahfitri
Loop-Level Parallelism
Design of Digital Circuits Lecture 19a: VLIW
Static Scheduling Techniques
ECE 721 Modern Superscalar Microarchitecture
Spring 2019 Prof. Eric Rotenberg
CPE 631 Lecture 12: Branch Prediction
Presentation transcript:

Predication ECE 721 Prof. Rotenberg

If-Conversion Technique for removing a branch Always fetch the control-dependent instructions, but conditionally execute them If-conversion is implemented via “predication”, also known as “guarding” Two styles of predication support in the ISA General predication Conditional moves

General Predication Predicate Register File Example: pred0 – pred7 (8 predicate registers) Each predicate register is just 1 bit A predicate is either false (0) or true (1) Some or all instruction opcodes may be predicated, depending on ISA A predicated opcode has an additional source register (implicit or explicit), which is a predicate register Example ADD rd, rs, rt, pred if (pred) rd = rs + rt else NOP

General Predication (cont.) Advantages of general predication Ability to directly predicate any instruction is more efficient than conditional moves, in terms of dynamic instruction count Thus permits more aggressive application of predication (predicate larger if/else regions) Disadvantages of general predication ISA design: Specifying a predicate register in many or all instructions takes instruction encoding space Microarchitecture design: More complex microarchitecture by virtue of almost every instruction having extra source register

General Predication (cont.) ISAs that have general predication VLIW ISAs TI DSPs Intel IA64 (general predicate register file) Heavy reliance on compiler-based scheduling requires getting rid of as many branches as possible (enlarge basic blocks for larger static scheduling scope) Some “RISC” ISAs ARM (condition codes serve as predicate registers)

Conditional Moves Most major commercial ISAs had to retrofit predication support (e.g., x86, Alpha, MIPS) Not feasible to extend existing instruction formats to reference predicate registers Instead, add a single instruction opcode called a “conditional move”, which is a predicated move Example CMOV rd, rs, pred if (pred) rd = rs else NOP ISA CMOV specification Comment x86 CMOVcc rd, rs “pred” is a test of condition codes. The test is specified in the opcode. Alpha, MIPS CMOVx rd, rs, rt “pred” is a test of a general-purpose register (rt). The test is specified in the opcode.

Examples r1:x, r2:y, r3:sum, r4,r5,r6:temps Source Code General Predication Conditional Moves if (x == y) sum++; CEQ pred5, r1, r2 ADDI r3, r3, #1, pred5 SUB r6, r1, r2 ADDI r4, r3, #1 CMOVZ r3, r4, r6 else sum--; SUBI r3, r3, #1, !pred5 SUB r6, r1, r2 ADDI r4, r3, #1 SUBI r5, r3, #1 CMOVZ r3, r4, r6 CMOVNZ r3, r5, r6 OR SUB r6, r1, r2 ADDI r4, r3, #1 SUBI r3, r3, #1 CMOVZ r3, r4, r6

Architecture Abstraction vs. Microarchitecture Implementation Architecture abstraction (ISA) Predicated instruction either executes or is converted into a NOP Microarchitecture implementation In-order pipeline, or OoO pipeline without register renaming: Always execute instruction If predicate is true, instruction writes its value into the destination register If predicate is false, instruction does not write into the destination register (old value is preserved) OoO pipeline with register renaming: Because logical destination is mapped to a new, “blank” physical destination register, the instruction must always perform a write to it If predicate is true, instruction writes new value of logical dest. into its physical destination register If predicate is false, instruction writes old value of logical dest. into its physical destination register. This means the logical destination register is also a logical source register ISA says: CMOV rd, rs, pred // (pred ? rd = rs : NOP) Microarchitecture says: CMOV rd, rs, rd, pred // rd = (pred ? rs : rd) p99 = (p22 ? p7 : p50) … after renaming

Limitations of Predication Predication is not profitable when branch’s control-dependent region is large and complex Many instructions Nested control-flow, e.g., loops, function calls, etc. Dynamic instruction count explodes, yielding lower performance than mispredicting the branch 10% of the time

Exploiting Control Independence See alternate slides

Control-flow strategy Pros Cons prediction Most streamlined when correct: No excess instructions. CIDD instructions are not delayed by branch predicates. Big misprediction penalty predication Eliminates mispredictions of predicated branch. Three performance overheads: Excess CD instructions, from non-selected path. CIDD instructions are delayed by branch predicates. CIDD instructions are delayed by CMOV’s, if ISA uses these or if employing dynamic hammock predication. Not applicable/profitable for many branches. control independence Exploits branch prediction: (1) Maximally streamlined when prediction is correct. (2) Reduced misprediction penalty when prediction is incorrect. Only CD and CIDD instructions are delayed. Complex implementation: Selective repair of control-flow within structures (CD instructions) and data-flow (CIDD instructions).