ILP: Software Approaches

Slides:



Advertisements
Similar presentations
MS108 Computer System I Lecture 7 Tomasulos Algorithm Prof. Xiaoyao Liang 2014/3/24 1.
Advertisements

CMSC 611: Advanced Computer Architecture Tomasulo Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Scoreboarding & Tomasulos Approach Bazat pe slide-urile lui Vincent H. Berk.
Topics Left Superscalar machines IA64 / EPIC architecture
Instruction-Level Parallelism
CMPUT429/CMPE382 Amaral 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic9: Software Pipelining (Some slides from David A. Patterson’s CS252, Spring 2001 Lecture.
Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
CS 378 Programming for Performance Single-Thread Performance: Compiler Scheduling for Pipelines Adopted from Siddhartha Chatterjee Spring 2009.
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Chapter 4 Predication CSE 820. Michigan State University Computer Science and Engineering Go over midterm exam.
Speculative ExecutionCS510 Computer ArchitecturesLecture Lecture 11 Trace Scheduling, Conditional Execution, Speculation, Limits of ILP.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
ENGS 116 Lecture 111 ILP: Software Approaches 2 Vincent H. Berk October 14 th Reading for monday: 3.10 – 3.15, Reading for today: 4.2 – 4.6.
Compiler techniques for exposing ILP
1 Lecture 5: Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2)
ENGS 116 Lecture 101 ILP: Software Approaches Vincent H. Berk October 12 th Reading for today: , 4.1 Reading for Friday: 4.2 – 4.6 Homework #2:
Loop Unrolling & Predication CSE 820. Michigan State University Computer Science and Engineering Software Pipelining With software pipelining a reorganized.
CPE 631: ILP, Static Exploitation Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
FTC.W99 1 Advanced Pipelining and Instruction Level Parallelism (ILP) ILP: Overlap execution of unrelated instructions gcc 17% control transfer –5 instructions.
COMP4611 Tutorial 6 Instruction Level Parallelism
1 Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
Eliminating Stalls Using Compiler Support. Instruction Level Parallelism gcc 17% control transfer –5 instructions + 1 branch –Reordering among 5 instructions.
EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.
EEL Advanced Pipelining and Instruction Level Parallelism Lotzi Bölöni.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Rung-Bin Lin Chapter 4: Exploiting Instruction-Level Parallelism with Software Approaches4-1 Chapter 4 Exploiting Instruction-Level Parallelism with Software.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
CS152 Lec15.1 Advanced Topics in Pipelining Loop Unrolling Super scalar and VLIW Dynamic scheduling.
Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)
Lecture 8: More ILP stuff Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
3.13. Fallacies and Pitfalls Fallacy: Processors with lower CPIs will always be faster Fallacy: Processors with faster clock rates will always be faster.
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
EECC551 - Shaaban #1 Winter 2002 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 9, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
Chapter 2 Instruction-Level Parallelism and Its Exploitation
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
Tomasulo’s Approach and Hardware Based Speculation
EECC551 - Shaaban #1 lec # 8 Winter Multiple Instruction Issue: CPI < 1 To improve a pipeline’s CPI to be better [less] than one, and to.
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
ENGS 116 Lecture 91 Dynamic Branch Prediction and Speculation Vincent H. Berk October 10, 2005 Reading for today: Chapter 3.2 – 3.6 Reading for Wednesday:
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
Instruction Level Parallelism Pipeline with data forwarding and accelerated branch Loop Unrolling Multiple Issue -- Multiple functional Units Static vs.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
Use of Pipelining to Achieve CPI < 1
Compiler Techniques for ILP
CS 352H: Computer Systems Architecture
Instruction Level Parallelism
CS203 – Advanced Computer Architecture
CSL718 : VLIW - Software Driven ILP
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
CS 704 Advanced Computer Architecture
Computer Architecture
Compiler techniques for exposing ILP (cont)
Siddhartha Chatterjee Spring 2008
Adapted from the slides of Prof
Instruction Level Parallelism (ILP)
CSC3050 – Computer Architecture
Dynamic Hardware Prediction
How to improve (decrease) CPI
Loop-Level Parallelism
Static Scheduling Techniques
Presentation transcript:

ILP: Software Approaches Bazat pe slide-urile lui Vincent H. Berk

HW Support for More ILP X A = B op C Avoid branch prediction by turning branches into conditionally executed instructions: If (X) then A = B op C else NOP If false, then neither store result nor cause exception Expanded ISA of Alpha, MIPS, PowerPC, SPARC have conditional move; PA-RISC can annul any following instruction. IA-64: 61 1-bit condition fields selected so conditional execution of any instruction Drawbacks to conditional instructions Still takes a clock even if “annulled” Stall if condition evaluated late Complex conditions reduce effectiveness; condition becomes known late in pipeline X A = B op C

Software Pipelining Observation: if iterations from loops are independent, then can get more ILP by taking instructions from different iterations Software pipelining: reorganizes loops so that each iteration is made from instructions chosen from different iterations of the original loop Iteration 1 2 3 4 pipelined

SW Pipelining Example Before: Unrolled 3 times 4 Before: Unrolled 3 times After: Software Pipelined 1 LD F0, 0 (R1) LD F0, 0 (R1) 2 ADDD F4, F0, F2 ADDD F4, F0, F2 3 SD 0 (R1), F4 LD F0, –8 (R1) 4 LD F6, –8 (R1) 1 SD 0 (R1), F4 Stores M[i] 5 ADDD F8, F6, F2 2 ADDD F4, F0, F2 Adds to M[i-1] 6 SD –8, (R1), F8 3 LD F0, –16 (R1) Loads M[i-2] 7 LD F10, –16 (R1) 4 SUBI R1, R1, #8 8 ADDD F12, F10, F2 5 BNEZ R1, LOOP 9 SD –16 (R1), F12 SD 0 (R1), F4 10 SUBI R1, R1, #24 ADDD F4, F0, F2 11 BNEZ R1, LOOP SD –8 (R1), F4 Read F4 Read F0 SD IF ID EX Mem WB Write F4 ADD IF ID EX Mem WB LD IF ID EX Mem WB Write F0 4

100 iterations = 25 loops with 4 unrolled iterations each SW Pipelining Example 5 Symbolic Loop Unrolling Smaller code space Overhead paid only once vs. each iteration in loop unrolling 100 iterations = 25 loops with 4 unrolled iterations each Software Pipelining Number of overlapped operations (a) Software pipelining Time Loop Unrolling Number of overlapped operations Time (b) Loop unrolling 5

Trace Scheduling Focus on critical path (trace selection) Compiler has to decide what the critical path (the trace) is Most likely basic blocks are put in the trace Loops are unrolled in the trace Now speed it up (trace compaction) Focus on limiting instruction count Branches are seen as jumps into or out of the trace Problem: Significant overhead for parts that are not in the trace Unclear if it is feasible in practice

Superblocks Similar to Trace Scheduling but: Tail duplication: Single entrance, multiple exits Tail duplication: Handle cases that exited the superblock Residual loop handling Could in itself be a superblock Problem: Code size Worth the hassle?

Conditional instructions Instruction that is executed depending on one of its arguments: BNEZ R1, L ADDU R2, R3, R0 L: VS CMOVZ R2, R3, R1 Instruction is executed but results are not always written. Should only be used for very small sequences, else use normal branch

Speculation Compiler moves instructions before branch if: Data flow is not affected (optionally with use of renaming) Preserve exception behavior Avoid load/store address conflicts (no renaming for memory loc.) Preserving exception behavior Mechanism to indicate an instruction is speculative Poison bit: raise exception when value is used Using Conditional instructions: Requires In-Order instruction commit Register renaming Writeback at commit Forwarding Raise exceptions at commit

Speculation LD R1, 0(R3) ; load A LD R14, 0(R2) ; load B (speculative) if (A==0) A=B; else A=A+4; LD R1, 0(R3) ; load A BNEZ R1, L1 ; test A LD R1, 0(R2) ; then J L2 ; skip else L1: DADDI R1, R1, #4 ; else L2: SD R1, 0(R3) ; store A LD R1, 0(R3) ; load A LD R14, 0(R2) ; load B (speculative) BEQZ R1, L3 ; branch if DADDI R14, R1, #4 ; else L3: SD R14, 0(R3) ; store A