Download presentation
Presentation is loading. Please wait.
Published byCoral French Modified over 9 years ago
1
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture Lab Slides by Selvin, Pascal, Pavel
2
The prelude to the IA-64 The need for greater processing power is increasing The need for greater processing power is increasing New innovative computing technologies Traditional computing has increasing problem sizes Architecture design from the ground-up to support ILP Enables the compiler to express more parallelism (EPIC) Reduces hardware cost of scheduling parallel instructions Current approaches Legacy architectures were not designed primarily for high ILP Non-architectural, principally OOO dynamic superscalar hardware IA-64 Growing market for high-performance 64-bit architecture No existing Intel 64-bit binaries
3
Not just for ILP Better building block for high performance systems Multi-programming gives limited improvements Parallelism has to be improved at all levels in the system Solely hardware-based multithreading cannot compensate for lack of parallelism in the basic processing element. SMT, CMP apply equally to RISC, CISC and EPIC Integrated hardware multithreading is orthogonal to EPIC Inter-thread interference in SMT processors Hardware Resource Utilization vs. Complexity Transistors : PA-8000 re-order buffer = PA-7200 Complexity scales quadratically for 1.5x or 2x increase in issue-width
4
Architecture vs. Implementation Speed of Functional units is architecture independent Memory and Data-cache hierarchy Largely independent of the architecture OOO RISC designs achieve better utilization With additional cost, it is possible to realise better designs IA-64 memory-system balanced cost and performance Cycle time of IA-64 IC process, number of registers, register ports, bypass network, number of cache ports Critical path is found in functional units and bypass networks IA-64 have higher utilization of this fundamental structure
5
IA-64 Parallelism Capabilities b Predication: less encountered branchesless encountered branches less mispredicted branchesless mispredicted branches more parallelismmore parallelism b Larger register set: new coding strategies (impossible with RISC)new coding strategies (impossible with RISC) more efficient than register renaming (RISC)more efficient than register renaming (RISC) less data loss in the event of an interruptionless data loss in the event of an interruption
6
IA-64 Parallelism Capabilities (2) b Features to deal with memory latency: earlier access to variablesearlier access to variables not restricted to fixed hardware algorithms for:not restricted to fixed hardware algorithms for: –correctly predicting execution path –triggering memory fetches heuristics to identify speculative load candidatesheuristics to identify speculative load candidates –compiler involved –control of the degree of speculation by the programmer
7
IA-64 Parallelism Capabilities (3) b Register Stack Engine (RSE): increases the utilization of the register fileincreases the utilization of the register file reduces the cost of procedures calls, returnsreduces the cost of procedures calls, returns especially valuable for object-oriented codeespecially valuable for object-oriented code straightforward hardware designstraightforward hardware design b Mechanisms to deliver instructions to the processor eliminate effects of increased code sizeeliminate effects of increased code size modest design costsmodest design costs
8
Results b Comparison between PA-RISC and IA-64 15 codes (encryption, decryption and keying for five AES algorithms)15 codes (encryption, decryption and keying for five AES algorithms) 8/15 IA-64 codes used more than 32 reg.8/15 IA-64 codes used more than 32 reg. 6/15 IA-64 codes smaller6/15 IA-64 codes smaller 2/15 IA-64 codes 4 times smaller2/15 IA-64 codes 4 times smaller overall code size 27 % larger (could have been reduced to 10%)overall code size 27 % larger (could have been reduced to 10%)
9
Compilers and IA-64 IA-64 uses existing compiler techniques to exploit parallelism: b data prefetch b branch hints b loop unrolling b profile-based path instructions b other
10
Need for compiler support IA-64 does require well-prepared code: (profiled, with branch hints, etc) to achieve high performance, but this is also true for Out-of-Order processors. Lack of code profiling is equally harmful both for IA-64 and OOO architectures. With profiled code, IA-64 is superior to OOO, as proven by benchmark tests (specFP64)
11
Critical path instructions (e.g. long latency operations) b OOO compilers don’t distinguish them, so such instrs. often have high exec. cost b IE-64 compilers must detect such instructions and make sure they start first (*) * Cost of mispredicts is minimized by prefetches issued by the compiler
12
Dealing with cache misses Compiler contribution: b static code generation (i.e. fewer branches) b branch hints Hardware mechanisms: b sample instructions on timer ticks, get information about actual program flow (HP) b feedback info on cache misses back to the program (Intel Itanium)
13
Dynamic prediction mechanisms (2) IA-64 has hint fields in most branch and memory instructions to allow the program collect flow info from and pass it to the processor. These features allow software to improve performance during the run-time, without recompilation.
14
Current and Future IA-64 implementations b Initial implementation (as always) focuses on the most important architectural elements only. b It uses the ideas of EPIC while providing compatibility with IA-32 and PA-RISC processors. b Future implementations will deliver even more ILP b Creators assure that the IA-64 architecture will not remain fixed
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.