StaticILP.1 2/12/02 Static ILP Static (Compiler Based) Scheduling Σημειώσεις UW-Madison Διαβάστε κεφ. 4 βιβλίο, και Paper on Itanium στην ιστοσελίδα
StaticILP.2 2/12/02 Today’s Theme and Contents Let compiler uncover the ILP –Objective:more ilp/simpler hardware/faster clock/less power How: –Static Scheduling –Loop Unrolling –software pipelining, –Static Multiple Issue: VLIW »local, global scheduling »static branch prediction »software speculation: trace scheduling, superblocks »nops, lockstep »conditional moves,predication »speculative loads IA-64 and Itanium
StaticILP.3 2/12/02 Basic Idea The compiler moves dependent instructions apart to avoid hazards This means: –such instructions exist (if not there employ transformations) –the compiler knows implementation details »latency AND superscalarity (issue width) What happens if implementation changes? Static ILP applicable to statically and dynamically scheduled processors Statically scheduled processors: the compiler dictates which instructions can execute together (scheduling done in software)
StaticILP.4 2/12/02 (Local Scheduling)
StaticILP.5 2/12/02 (Local Scheduling)
StaticILP.6 2/12/02
StaticILP.7 2/12/02
StaticILP.8 2/12/02
StaticILP.9 2/12/02
StaticILP.10 2/12/02
StaticILP.11 2/12/02
StaticILP.12 2/12/02
StaticILP.13 2/12/02
StaticILP.14 2/12/02
StaticILP.15 2/12/02 (useful for large iteration counts)
StaticILP.16 2/12/02 Software speculation/Global Scheduling
StaticILP.17 2/12/02
StaticILP.18 2/12/02 HOW?? Static prediction, profile, frequency, path Which is better the above or dynamic prediction
StaticILP.19 2/12/02
StaticILP.20 2/12/02 Register pressure
StaticILP.21 2/12/02 Superblocking : overcomes some of the complexities of trace scheduling single vs multiple entry
StaticILP.22 2/12/02
StaticILP.23 2/12/02
StaticILP.24 2/12/02
StaticILP.25 2/12/02 Does noy have
StaticILP.26 2/12/02
StaticILP.27 2/12/02
StaticILP.28 2/12/02 PentiumIV +3GHz vs Itanium 1GHz
StaticILP.29 2/12/02 LockStep: any hazard stall / NOPs if not enough //ism
StaticILP.30 2/12/02
StaticILP.31 2/12/02 Predicated Execution & Conditional Moves Convert control dependences to data dependences if (a=0) s=t;R1 R2 R3 bnezR1,L adduR2,R3,0 L: cmovzR2,R3,R1 Above for all itypes is called predication… +/-?
StaticILP.32 2/12/02 Speculative Loads Bypass stores speculative - repair code in case of mispeculation Use an address buffer 1. LookUp Table: updated by address of speculative load 2. Updated by addresses of intervening stores 3. Check instruction that no store conflicted and release entry
StaticILP.33 2/12/02
StaticILP.34 2/12/02
StaticILP.35 2/12/02
StaticILP.36 2/12/02
StaticILP.37 2/12/02
StaticILP.38 2/12/02 Let the compiler do the work All Most of it As long as it improves performance …
StaticILP.39 2/12/02 by Harsh Sharangpani and Ken Arora see web page
StaticILP.40 2/12/02
StaticILP.41 2/12/02 Idea Compiler has larger instruction window than hardware. Communicate to the hardware more of the information gleaned at compile time.
StaticILP.42 2/12/02 Six instructions wide and ten stage deep Tries to minimize latency of most frequent operations Hardware support for compilation time indeterminacies
StaticILP.43 2/12/02 Software initiated prefetch (requests filtered by instruction cache) prefetch must be 12 cycles before branch to hide latency L2 -> streaming buffer -> instruction cache Four level branch predictor hierarchy to prevent 9-cycle pipeline stall Decoupling buffer hold up to 8 bundles of code (bundle?)
StaticILP.44 2/12/02 Conclusion/Future Compiler can do a lot of the work but need hardware assitance Currently in pursue of best of both worlds Future: –How long IA-32 will last --- and will IA-64 take over IA32 market? –Will IA64 be the only ISA in the world?