OOE v.s. EPIC Hridesh Rajan Zhendong Yu Weilin Zhong
Outline zIntroduction yEPIC yOOE zComparison yILP, Power Consumption, Code Size, Performance, Compiler techniques zPerformance Evaluation zConclusion
Introduction - EPIC zEPIC (Explicit Parallelism Instruction Computing) yAn evolution of VLIW yCan be considered more as “philosophy”, than “architecture”.
Introduction - EPIC (2) zInstruction example (IA-64): yLong instruction contains multiple operations and a template specifying dependencies between instructions. Op1Op2Op3template Op1Op2Op3template
Introduction - OOE zOOE (Out of Order Execution) superscalar yNot an explicit way to demonstrate dependencies between instructions.
Comparison - Complexity zOOE: zComplexity = Complexity(bpred) + Complexity (Register Renaming) + Complexity(Dependency Checking)+ Complexity(Alias Detection) zEPIC: zComplexity = Complexity(Nat) + Complexity (ALAT) + Complexity (CFM) + Complexity (RSE)
Comparison – power consumption zOOE: yLess power consumption zEPIC: yMore power consumption
Code Size zOOE: yCompact code (more branches) zEPIC ySparse code (code bloat) zIt depends on compilers
Comparison - ILP zOOE: (disadvantages) yParallelism at the level of machine instructions: which can be issued in a single cycle in a processor. yLimited ILP, ILP is not evenly distributed yData dependency, control dependency yResource dependency x# of registers, x# of ports to registers and memory x# of parallel instruction decoders, x# of function units x# of data paths between various CPU components
Comparison – ILP(2) zOOE (advantages) yPredicted path yDynamic adjustment of instruction schedule based on the actual execution path and cache miss results xIt can deal with stalls smartly
Comparison - ILP (3) zEPIC (Disadvantage) yDynamic path tends to be longer yStatic decisions based on compiler xWhat if the program stalls? yRecovery Code
Comparison – ILP(4) zEPIC: (advantages) yMassive resources xLarger register sets, more function units, etc. yPredication reduces branch penalties ySpeculation reduces cache miss
Role of Compiler vs. Hardware zOOE: yParallelism detection and scheduling: Hardware yMore powerful hardware, less powerful compiler zEPIC: yParallelism detection and scheduling: Compiler/Hardware yMore powerful compiler, less powerful hardware
Comparison - frequency zOOE: yHigh frequency zEPIC: yLow frequency due to: xFocus on CPI xPerformance compares and dependent branches in the same cycle. xPredicated Execution xPower Consumption
Performance zMethodologies in performance comparison yCPI, CPU frequency, and the tradeoffs. zHowever, Itatium does not show great improvement over Alpha or Pentium IV.
Conclusion zEPIC seems to be a good alternate to OOE (can OOE use EPIC techniques?) zBut there is no explicit proof in the performance gain. zTradeoffs are always there. It depends on what kind of processor behavior we need. zTime will prove everything.
References zA Critical look at IA-64, M. Hopkins zIs Out-of-Order Out of Date?, W. S. Worley, J. Huck zEPIC: An Architecture for Instruction-level Parallel Processors, M. S. Schlansker, B. R. Rao
Thank you! Questions?