现代计算机体系结构 主讲教师:张钢天津大学计算机学院 2009 年.

Slides:



Advertisements
Similar presentations
Computer Architecture Instruction-Level Parallel Processors
Advertisements

CSCI 4717/5717 Computer Architecture
CS136, Advanced Architecture Limits to ILP Simultaneous Multithreading.
Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
A scheme to overcome data hazards
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative.
EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Rung-Bin Lin Chapter 4: Exploiting Instruction-Level Parallelism with Software Approaches4-1 Chapter 4 Exploiting Instruction-Level Parallelism with Software.
CSE 8383 Superscalar Processor 1 Abdullah A Alasmari & Eid S. Alharbi.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Instruction-Level Parallelism (ILP)
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
Lecture 8: More ILP stuff Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
National & Kapodistrian University of Athens Dep.of Informatics & Telecommunications MSc. In Computer Systems Technology Advanced Computer Architecture.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Cont. Computer Architecture.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Chapter 3 Instruction-Level Parallelism and Its Dynamic Exploitation – Concepts 吳俊興 高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.
EECC551 - Shaaban #1 Winter 2002 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
1 Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections )
Chapter 2 Instruction-Level Parallelism and Its Exploitation
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Microprocessor Microarchitecture Limits of Instruction-Level Parallelism Lynn Choi Dept. Of Computer and Electronics Engineering.
Korea UniversityG. Lee CRE652 Processor Architecture Course Objective: To gain (1). knowledge on the current issues in processor architectures,
Limits of Instruction-Level Parallelism Presentation by: Robert Duckles CSE 520 Paper being presented: Limits of Instruction-Level Parallelism David W.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.
现代计算机体系结构 1 主讲教师:张钢 教授 天津大学计算机学院 通信邮箱: 提交作业邮箱: 2012 年.
CSE431 L13 SS Execute & Commit.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 13: SS Backend (Execute, Writeback & Commit) Mary Jane.
CS203 – Advanced Computer Architecture ILP and Speculation.
Use of Pipelining to Achieve CPI < 1
CS 352H: Computer Systems Architecture
Computer Architecture Principles Dr. Mike Frank
/ Computer Architecture and Design
Simultaneous Multithreading
Lynn Choi Dept. Of Computer and Electronics Engineering
Limits on ILP and Multithreading
CS203 – Advanced Computer Architecture
CC 423: Advanced Computer Architecture Limits to ILP
Electrical and Computer Engineering
Advantages of Dynamic Scheduling
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Limits to ILP Conflicting studies of amount
Yingmin Li Ting Yan Qi Zhao
Adapted from the slides of Prof
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
CS 704 Advanced Computer Architecture
Sampoorani, Sivakumar and Joshua
Adapted from the slides of Prof
CSC3050 – Computer Architecture
Dynamic Hardware Prediction
Lecture 10: ILP Innovations
Lecture 9: ILP Innovations
Conceptual execution on a processor which exploits ILP
Presentation transcript:

现代计算机体系结构 主讲教师:张钢天津大学计算机学院 2009 年

Limits on Instruction-Level Parallelism

Studies of the Limitations of ILP The Hardware Model –ideal processor all artificial constraints on ILP are removed. –Register renaming There are an infinite number of virtual registers available –Architecturally visible registers all WAW and WAR hazards are avoided an unbounded number of instructions can begin execution simultaneously

Studies of the Limitations of ILP The Hardware Model –Branch prediction Branch prediction is perfect All conditional branches are predicted exactly –Jump prediction All jumps are perfectly predicted including jump register used for return and computed jumps an unbounded buffer of instructions available for execution.

Studies of the Limitations of ILP The Hardware Model –Memory-address alias analysis All memory addresses are known exactly a load can be moved before a store if the addresses are not identical. –can issue an unlimited number of instructions at once –all functional unit latencies are assumed to be one cycle

Studies of the Limitations of ILP The Hardware Model –perfect caches all loads and stores always complete in one cycle (100% hit). –ILP is limited only by the data dependences

Studies of the Limitations of ILP ILP available in a perfect processor –Average amount of parallelism available

Studies of the Limitations of ILP The perfect processor must do –Look arbitrarily far ahead to find a set of instructions to issue predicting all branches perfectly. –Rename all register uses to avoid WAR and WAW hazards. –Determine data dependencies among the instructions if so, rename accordingly.

Studies of the Limitations of ILP The perfect processor must do –Determine memory dependences handle them appropriately. –Provide enough replicated functional units to allow all the ready instructions to issue

Studies of the Limitations of ILP Determine data dependencies –How many comparisons is needed for 3 instruction issue? Only for RAW check 2x2 + 2x1 – How many comparisons is needed for n instruction issue? 2( n -1) + 2( n -2) + … + 2x1 = n 2 - n 2450 for n =50 All the comparisons is made at the same time

Studies of the Limitations of ILP Limitations on the Window Size and Maximum Issue Count The instruction window –The set of instructions that are examined for simultaneous execution limits the number of instructions that begin execution in a given cycle –limited by the required storage, the comparisons, and a limited issue rate In the range of 32 to 126

Studies of the Limitations of ILP Limitations on the Window Size and Maximum Issue Count Real processors more limited by –number of functional units –numbers of buses –register access ports large window sizes are impractical and inefficient

Studies of the Limitations of ILP The effects of reducing the size of the window.

Studies of the Limitations of ILP The effects of reducing the size of the window.

Studies of the Limitations of ILP The Effects of Realistic Branch and Jump Prediction Tournament predictor

Studies of the Limitations of ILP The Effects of Realistic Branch and Jump Prediction

Studies of the Limitations of ILP The Effects of Finite Registers

Studies of the Limitations of ILP The Effects of Finite Registers

Studies of the Limitations of ILP The Effects of Imperfect Alias Analysis

Studies of the Limitations of ILP The Effects of Imperfect Alias Analysis

Limitations on ILP for Realizable Processors Realizable Processors –Up to 64 instruction issue per clock Logic complexity –A tournament predictor with 1K entries and 16-entry return predictor The predictor is not a primary bottleneck

Limitations on ILP for Realizable Processors Realizable Processors –Perfect disambiguation of memory references done dynamically Through memory dependence predictor –Register renaming with 64 additional integer and 64 additional FP register

Limitations on ILP for Realizable Processors Limitations of the perfect processor –WAR and WAW hazards through memory Arise due to the allocation of stack frames –A called procedure reuses the memory locations of a previous procedure on the stack –Unnecessary dependencies Loop contains at least one dependency –Which can’t be eliminated dynamically Overcoming the data flow limit –Value prediction predicting data values and speculating on the prediction For ( i=0; i<M; i++) { }

Limitations on ILP for Realizable Processors Proposals of the realizable processor –Address value prediction and speculation Predict memory address values and speculates by reordering loads and stores Can be accomplished by simpler techniques For ( i=0; i<M; i++) { A[i] = … –Speculating on multiple paths The cost of incorrect recovery is reduced Only for limited branches

Limitations on ILP for Realizable Processors