Presentation is loading. Please wait.

Presentation is loading. Please wait.

现代计算机体系结构 主讲教师:张钢天津大学计算机学院 2009 年.

Similar presentations


Presentation on theme: "现代计算机体系结构 主讲教师:张钢天津大学计算机学院 2009 年."— Presentation transcript:

1 现代计算机体系结构 主讲教师:张钢天津大学计算机学院 gzhang@tju.edu.cn 2009 年

2 Limits on Instruction-Level Parallelism

3 Studies of the Limitations of ILP The Hardware Model –ideal processor all artificial constraints on ILP are removed. –Register renaming There are an infinite number of virtual registers available –Architecturally visible registers all WAW and WAR hazards are avoided an unbounded number of instructions can begin execution simultaneously

4 Studies of the Limitations of ILP The Hardware Model –Branch prediction Branch prediction is perfect All conditional branches are predicted exactly –Jump prediction All jumps are perfectly predicted including jump register used for return and computed jumps an unbounded buffer of instructions available for execution.

5 Studies of the Limitations of ILP The Hardware Model –Memory-address alias analysis All memory addresses are known exactly a load can be moved before a store if the addresses are not identical. –can issue an unlimited number of instructions at once –all functional unit latencies are assumed to be one cycle

6 Studies of the Limitations of ILP The Hardware Model –perfect caches all loads and stores always complete in one cycle (100% hit). –ILP is limited only by the data dependences

7 Studies of the Limitations of ILP ILP available in a perfect processor –Average amount of parallelism available

8 Studies of the Limitations of ILP The perfect processor must do –Look arbitrarily far ahead to find a set of instructions to issue predicting all branches perfectly. –Rename all register uses to avoid WAR and WAW hazards. –Determine data dependencies among the instructions if so, rename accordingly.

9 Studies of the Limitations of ILP The perfect processor must do –Determine memory dependences handle them appropriately. –Provide enough replicated functional units to allow all the ready instructions to issue

10 Studies of the Limitations of ILP Determine data dependencies –How many comparisons is needed for 3 instruction issue? Only for RAW check 2x2 + 2x1 – How many comparisons is needed for n instruction issue? 2( n -1) + 2( n -2) + … + 2x1 = n 2 - n 2450 for n =50 All the comparisons is made at the same time

11 Studies of the Limitations of ILP Limitations on the Window Size and Maximum Issue Count The instruction window –The set of instructions that are examined for simultaneous execution limits the number of instructions that begin execution in a given cycle –limited by the required storage, the comparisons, and a limited issue rate In the range of 32 to 126

12 Studies of the Limitations of ILP Limitations on the Window Size and Maximum Issue Count Real processors more limited by –number of functional units –numbers of buses –register access ports large window sizes are impractical and inefficient

13 Studies of the Limitations of ILP The effects of reducing the size of the window.

14 Studies of the Limitations of ILP The effects of reducing the size of the window.

15 Studies of the Limitations of ILP The Effects of Realistic Branch and Jump Prediction Tournament predictor

16 Studies of the Limitations of ILP The Effects of Realistic Branch and Jump Prediction

17 Studies of the Limitations of ILP The Effects of Finite Registers

18 Studies of the Limitations of ILP The Effects of Finite Registers

19 Studies of the Limitations of ILP The Effects of Imperfect Alias Analysis

20 Studies of the Limitations of ILP The Effects of Imperfect Alias Analysis

21 Limitations on ILP for Realizable Processors Realizable Processors –Up to 64 instruction issue per clock Logic complexity –A tournament predictor with 1K entries and 16-entry return predictor The predictor is not a primary bottleneck

22 Limitations on ILP for Realizable Processors Realizable Processors –Perfect disambiguation of memory references done dynamically Through memory dependence predictor –Register renaming with 64 additional integer and 64 additional FP register

23 Limitations on ILP for Realizable Processors Limitations of the perfect processor –WAR and WAW hazards through memory Arise due to the allocation of stack frames –A called procedure reuses the memory locations of a previous procedure on the stack –Unnecessary dependencies Loop contains at least one dependency –Which can’t be eliminated dynamically Overcoming the data flow limit –Value prediction predicting data values and speculating on the prediction For ( i=0; i<M; i++) { }

24 Limitations on ILP for Realizable Processors Proposals of the realizable processor –Address value prediction and speculation Predict memory address values and speculates by reordering loads and stores Can be accomplished by simpler techniques For ( i=0; i<M; i++) { A[i] = … –Speculating on multiple paths The cost of incorrect recovery is reduced Only for limited branches

25 Limitations on ILP for Realizable Processors


Download ppt "现代计算机体系结构 主讲教师:张钢天津大学计算机学院 2009 年."

Similar presentations


Ads by Google