Download presentation
Presentation is loading. Please wait.
1
现代计算机体系结构 主讲教师:张钢天津大学计算机学院 gzhang@tju.edu.cn 2009 年
2
Limits on Instruction-Level Parallelism
3
Studies of the Limitations of ILP The Hardware Model –ideal processor all artificial constraints on ILP are removed. –Register renaming There are an infinite number of virtual registers available –Architecturally visible registers all WAW and WAR hazards are avoided an unbounded number of instructions can begin execution simultaneously
4
Studies of the Limitations of ILP The Hardware Model –Branch prediction Branch prediction is perfect All conditional branches are predicted exactly –Jump prediction All jumps are perfectly predicted including jump register used for return and computed jumps an unbounded buffer of instructions available for execution.
5
Studies of the Limitations of ILP The Hardware Model –Memory-address alias analysis All memory addresses are known exactly a load can be moved before a store if the addresses are not identical. –can issue an unlimited number of instructions at once –all functional unit latencies are assumed to be one cycle
6
Studies of the Limitations of ILP The Hardware Model –perfect caches all loads and stores always complete in one cycle (100% hit). –ILP is limited only by the data dependences
7
Studies of the Limitations of ILP ILP available in a perfect processor –Average amount of parallelism available
8
Studies of the Limitations of ILP The perfect processor must do –Look arbitrarily far ahead to find a set of instructions to issue predicting all branches perfectly. –Rename all register uses to avoid WAR and WAW hazards. –Determine data dependencies among the instructions if so, rename accordingly.
9
Studies of the Limitations of ILP The perfect processor must do –Determine memory dependences handle them appropriately. –Provide enough replicated functional units to allow all the ready instructions to issue
10
Studies of the Limitations of ILP Determine data dependencies –How many comparisons is needed for 3 instruction issue? Only for RAW check 2x2 + 2x1 – How many comparisons is needed for n instruction issue? 2( n -1) + 2( n -2) + … + 2x1 = n 2 - n 2450 for n =50 All the comparisons is made at the same time
11
Studies of the Limitations of ILP Limitations on the Window Size and Maximum Issue Count The instruction window –The set of instructions that are examined for simultaneous execution limits the number of instructions that begin execution in a given cycle –limited by the required storage, the comparisons, and a limited issue rate In the range of 32 to 126
12
Studies of the Limitations of ILP Limitations on the Window Size and Maximum Issue Count Real processors more limited by –number of functional units –numbers of buses –register access ports large window sizes are impractical and inefficient
13
Studies of the Limitations of ILP The effects of reducing the size of the window.
14
Studies of the Limitations of ILP The effects of reducing the size of the window.
15
Studies of the Limitations of ILP The Effects of Realistic Branch and Jump Prediction Tournament predictor
16
Studies of the Limitations of ILP The Effects of Realistic Branch and Jump Prediction
17
Studies of the Limitations of ILP The Effects of Finite Registers
18
Studies of the Limitations of ILP The Effects of Finite Registers
19
Studies of the Limitations of ILP The Effects of Imperfect Alias Analysis
20
Studies of the Limitations of ILP The Effects of Imperfect Alias Analysis
21
Limitations on ILP for Realizable Processors Realizable Processors –Up to 64 instruction issue per clock Logic complexity –A tournament predictor with 1K entries and 16-entry return predictor The predictor is not a primary bottleneck
22
Limitations on ILP for Realizable Processors Realizable Processors –Perfect disambiguation of memory references done dynamically Through memory dependence predictor –Register renaming with 64 additional integer and 64 additional FP register
23
Limitations on ILP for Realizable Processors Limitations of the perfect processor –WAR and WAW hazards through memory Arise due to the allocation of stack frames –A called procedure reuses the memory locations of a previous procedure on the stack –Unnecessary dependencies Loop contains at least one dependency –Which can’t be eliminated dynamically Overcoming the data flow limit –Value prediction predicting data values and speculating on the prediction For ( i=0; i<M; i++) { }
24
Limitations on ILP for Realizable Processors Proposals of the realizable processor –Address value prediction and speculation Predict memory address values and speculates by reordering loads and stores Can be accomplished by simpler techniques For ( i=0; i<M; i++) { A[i] = … –Speculating on multiple paths The cost of incorrect recovery is reduced Only for limited branches
25
Limitations on ILP for Realizable Processors
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.