Download presentation
Presentation is loading. Please wait.
Published byChristine Blankenship Modified over 9 years ago
1
OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu
2
Objective Our objective is to evaluate the claims and counterclaims about OOE and EPIC made in: – “Is Out-of-Order Out of Date?” by William S. Worley and Jerry Huck – “A Critical Look at IA-64” by Martin Hopkins
3
Outline Analysis of ILP Analysis of Code Size Analysis of Hardware Complexity Analysis of Compiler Complexity Analysis of Power Consumption Comparison Methodology Conclusion
4
What is EPIC? “One of our goals for EPIC was to retain VLIW's philosophy of statically constructing the POE, but to augment it with features, akin to those in a superscalar processor, that would permit it to better cope with these dynamic factors. The EPIC philosophy has the following key aspects to it.” “Providing the ability to design the desired POE at compile- time.” “Providing features that permit the compiler to "play" the statistics.” “Providing the ability to communicate the POE to the hardware.” *From EPIC: An architecture for instruction-level parallel processors by Michael S. Schlansker and B. Ramakrishna Rau.
5
Analysis of ILP MH: Hardware provides good ILP because it dynamically adjusts the instruction schedule based on the actual execution path and cache misses, with the use of: – Large reorder buffers – Register renaming – Branch prediction – Alias detection WW & JH: Compiler can exploit ILP more effectively with the use of: – Massive resources -- large register set, more function units – Predication – Speculation
6
Analysis of ILP (cont.) Our observation: – From H&P book: The SPECint benchmark shows that the Alpha 21264 and Pentium 4 considerably outperform the Itanium. The SPECfp benchmark shows that the Itanium slightly outperforms the Alpha 21264 and Pentium 4. – These diagrams are not an absolute measurement of the performance of OOE and EPIC. A different implementations of the architectures may perform differently. As EPIC compilers improve over time, these performance figures will change.
7
Analysis of Code Size MH: Code size for IA-64 could be as much as 4 times that of x86 to perform the same work. WW & JH: Code size will be larger, but the instruction stream will contain fewer branches. Also, there are mechanisms to efficiently deliver instructions to the processor.
8
Analysis of Code Size (cont.) Our observation: – Both sides agree that code size increases overall, however they disagree on the extent to which it affects performance. – EPIC code size will expand dramatically in some cases. – EPIC code size can also be smaller than OOE code size in some cases. – We expect that a mature optimizing compiler will be able to deliver code with reasonable size and, after all, code size doesn’t necessarily reflect performance loss linearly.
9
Analysis of Hardware Complexity MH: To support features for greater ILP, EPIC hardware will be quite complex. – Predication requires more functional units – NaT bits to allow deferring exceptions – ALAT to allow loads before stores WW & JH: IA-64 makes the hardware less complex because it is not responsible for detecting and scheduling the parallelism. – Reorder buffer, register renaming, etc
10
Analysis of Hardware Complexity (cont.) Our observation: – Is EPIC processor more complex than OOE processor? Example: Alpha 21264, two stages fewer (but more stages don't necessarily mean more complexity) – As mentioned in H&P book, good techniques in ‘enemy camp’ are often borrowed. EPIC processors are expected to be simple. However, to support better ILP, they will also invoke hardware support, which makes them more complex than expected.
11
Analysis of Compiler Complexity MH: It is very difficult to write a good EPIC compiler. Profiling is also a burden: – Not welcomed by programmers – Hard to get and maintain a test suite – Formidable task for large programs WW & JH: OOE compilers are difficult to write as well. – OOE processors still need good compilers to ensure performance gains. – OOE compiler writers must understand the limitations of the hardware and figure out how to work around them. – Code profiling is only “slightly” more important for EPIC processors.
12
Analysis of Compiler Complexity (cont.) Our observation: – Optimizing compiler can help performance for both OOE and EPIC processors. – Profiling, which is a non-trivial task, adds complexity to compiler. – An EPIC compiler has a much more responsibility than an OOE compiler, so it is likely to be more complex. – The EPIC philosophy aims to trade compiler complexity for hardware simplicity. Whether this is a critical disadvantage must be considered in the context of overall system complexity and performance.
13
Analysis of Power Consumption MH: Massive resources consume lots of power. – “Thus, IA-6 gambles that, in the future, power will not be the critical limitation, …” WW & JH: They left this issue out, perhaps because they do not think it is a big problem.
14
Analsysis of Power Consumption (cont.) Our observation: – The use of massive resources is likely to consume more power. – Whether or not this will be a problem depends on the aimed application area of the EPIC technology. For servers and high-end workstations, the power consumption is not as important. For embedded systems, power consumption is likely a very critical issue. – For EPIC really to be a ‘general purpose’ technology, power consumption control must be considered.
15
Comparison Methodology MH: Accumulating “facts” supporting a skeptical view of EPIC. – Example: EPIC stalls when OOE proceeds WW & JH: Accumulating “facts” supporting an optimistic view of EPIC. – Example: Dynamic translation Architecture design is a balance of CPI, frequency, instruction count, application limitation, and cost. There are always cases and countercases for every solution. They need to be considered in an integrated context.
16
Comparison Methodology (cont.) EPIC stalls when OOE proceeds – This will happen in some cases. – But, we must determine how this case actually hurts performance. Cache miss is not a common case. Speculation makes this case even less common. In cache miss, OOE is also not expected to proceed far enough.
17
Comparison Methodology (cont.) Dynamic translation – It rarely gives much performance gain with highly optimized code. – Dynamo example:
18
Conclusion – Both authors make claims about the EPIC architecture without providing any quantitative evidence. – Quantitative evidence is necessary to conclude that one architecture is superior to another. – EPIC is a useful effort in the exploration of higher ILP. When evaluating it, we need to isolate the usefulness of the architectural approach from a single specific implementation of it. The idea behind EPIC is good, but more time, effort, and calm calculation are needed to know whether it works.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.