StaticILP.1 2/12/02 Static ILP Static (Compiler Based) Scheduling Σημειώσεις UW-Madison Διαβάστε κεφ. 4 βιβλίο, και Paper on Itanium στην ιστοσελίδα.

Slides:



Advertisements
Similar presentations
Asanovic/Devadas Spring VLIW/EPIC: Statically Scheduled ILP Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology.
Advertisements

CS 378 Programming for Performance Single-Thread Performance: Compiler Scheduling for Pipelines Adopted from Siddhartha Chatterjee Spring 2009.
HW 2 is out! Due 9/25!. CS 6290 Static Exploitation of ILP.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
1 Lecture 5: Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2)
ENGS 116 Lecture 101 ILP: Software Approaches Vincent H. Berk October 12 th Reading for today: , 4.1 Reading for Friday: 4.2 – 4.6 Homework #2:
Loop Unrolling & Predication CSE 820. Michigan State University Computer Science and Engineering Software Pipelining With software pipelining a reorganized.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Instruction Level Parallelism María Jesús Garzarán University of Illinois at Urbana-Champaign.
COMP4611 Tutorial 6 Instruction Level Parallelism
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Introduction 23rd Mar, 2006.
1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.
1 Lecture 18: VLIW and EPIC Static superscalar, VLIW, EPIC and Itanium Processor (First introduce fast and high- bandwidth L1 cache design)
Rung-Bin Lin Chapter 4: Exploiting Instruction-Level Parallelism with Software Approaches4-1 Chapter 4 Exploiting Instruction-Level Parallelism with Software.
Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)
Chapter 4 CSF 2009 The processor: Instruction-Level Parallelism.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
3.13. Fallacies and Pitfalls Fallacy: Processors with lower CPIs will always be faster Fallacy: Processors with faster clock rates will always be faster.
Instruction Level Parallelism (ILP) Colin Stevens.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
EENG449b/Savvides Lec /20/04 February 12, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
Multiscalar processors
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Hardware Support for Compiler Speculation
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.
1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
1 Lecture 12: Advanced Static ILP Topics: parallel loops, software speculation (Sections )
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE /12/2004.
Use of Pipelining to Achieve CPI < 1
CS 352H: Computer Systems Architecture
Instruction Level Parallelism
Computer Architecture Principles Dr. Mike Frank
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
5.2 Eleven Advanced Optimizations of Cache Performance
Henk Corporaal TUEindhoven 2009
Morgan Kaufmann Publishers The Processor
Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
Yingmin Li Ting Yan Qi Zhao
Coe818 Advanced Computer Architecture
Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
Lecture 23: Static Scheduling for High ILP
Lecture: Static ILP Topics: loop unrolling, software pipelines (Sections C.5, 3.2) HW3 posted, due in a week.
Advanced Computer Architecture
Henk Corporaal TUEindhoven 2011
Sampoorani, Sivakumar and Joshua
Instruction Level Parallelism (ILP)
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
CSC3050 – Computer Architecture
Dynamic Hardware Prediction
How to improve (decrease) CPI
Static Scheduling Techniques
Lecture 5: Pipeline Wrap-up, Static ILP
Presentation transcript:

StaticILP.1 2/12/02 Static ILP Static (Compiler Based) Scheduling Σημειώσεις UW-Madison Διαβάστε κεφ. 4 βιβλίο, και Paper on Itanium στην ιστοσελίδα

StaticILP.2 2/12/02 Today’s Theme and Contents Let compiler uncover the ILP –Objective:more ilp/simpler hardware/faster clock/less power How: –Static Scheduling –Loop Unrolling –software pipelining, –Static Multiple Issue: VLIW »local, global scheduling »static branch prediction »software speculation: trace scheduling, superblocks »nops, lockstep »conditional moves,predication »speculative loads IA-64 and Itanium

StaticILP.3 2/12/02 Basic Idea The compiler moves dependent instructions apart to avoid hazards This means: –such instructions exist (if not there employ transformations) –the compiler knows implementation details »latency AND superscalarity (issue width) What happens if implementation changes? Static ILP applicable to statically and dynamically scheduled processors Statically scheduled processors: the compiler dictates which instructions can execute together (scheduling done in software)

StaticILP.4 2/12/02 (Local Scheduling)

StaticILP.5 2/12/02 (Local Scheduling)

StaticILP.6 2/12/02

StaticILP.7 2/12/02

StaticILP.8 2/12/02

StaticILP.9 2/12/02

StaticILP.10 2/12/02

StaticILP.11 2/12/02

StaticILP.12 2/12/02

StaticILP.13 2/12/02

StaticILP.14 2/12/02

StaticILP.15 2/12/02 (useful for large iteration counts)

StaticILP.16 2/12/02 Software speculation/Global Scheduling

StaticILP.17 2/12/02

StaticILP.18 2/12/02 HOW?? Static prediction, profile, frequency, path Which is better the above or dynamic prediction

StaticILP.19 2/12/02

StaticILP.20 2/12/02 Register pressure

StaticILP.21 2/12/02 Superblocking : overcomes some of the complexities of trace scheduling single vs multiple entry

StaticILP.22 2/12/02

StaticILP.23 2/12/02

StaticILP.24 2/12/02

StaticILP.25 2/12/02 Does noy have

StaticILP.26 2/12/02

StaticILP.27 2/12/02

StaticILP.28 2/12/02 PentiumIV +3GHz vs Itanium 1GHz

StaticILP.29 2/12/02 LockStep: any hazard stall / NOPs if not enough //ism

StaticILP.30 2/12/02

StaticILP.31 2/12/02 Predicated Execution & Conditional Moves Convert control dependences to data dependences if (a=0) s=t;R1 R2 R3 bnezR1,L adduR2,R3,0 L: cmovzR2,R3,R1 Above for all itypes is called predication… +/-?

StaticILP.32 2/12/02 Speculative Loads Bypass stores speculative - repair code in case of mispeculation Use an address buffer 1. LookUp Table: updated by address of speculative load 2. Updated by addresses of intervening stores 3. Check instruction that no store conflicted and release entry

StaticILP.33 2/12/02

StaticILP.34 2/12/02

StaticILP.35 2/12/02

StaticILP.36 2/12/02

StaticILP.37 2/12/02

StaticILP.38 2/12/02 Let the compiler do the work All Most of it As long as it improves performance …

StaticILP.39 2/12/02 by Harsh Sharangpani and Ken Arora see web page

StaticILP.40 2/12/02

StaticILP.41 2/12/02 Idea Compiler has larger instruction window than hardware. Communicate to the hardware more of the information gleaned at compile time.

StaticILP.42 2/12/02 Six instructions wide and ten stage deep Tries to minimize latency of most frequent operations Hardware support for compilation time indeterminacies

StaticILP.43 2/12/02 Software initiated prefetch (requests filtered by instruction cache) prefetch must be 12 cycles before branch to hide latency L2 -> streaming buffer -> instruction cache Four level branch predictor hierarchy to prevent 9-cycle pipeline stall Decoupling buffer hold up to 8 bundles of code (bundle?)

StaticILP.44 2/12/02 Conclusion/Future Compiler can do a lot of the work but need hardware assitance Currently in pursue of best of both worlds Future: –How long IA-32 will last --- and will IA-64 take over IA32 market? –Will IA64 be the only ISA in the world?