Download presentation
Presentation is loading. Please wait.
Published byArsène Marin Paquette Modified over 6 years ago
1
IA-64 Microarchitecture --- Itanium Processor
Jun Feng Jun Xie Huafeng Lü
2
Outline Introduction Pipeline Issue Performance Comparison Summary
3
Itanium Processor First implementation of IA-64
Compiler based exploitation of ILP Also has many features of superscalar
6
10-stage Pipeline Front-end Instruction delivery Operand delivery
Execution
8
Front-end IPG, Fetch, Rotate
Prefetches up to 32 bytes per cycle (2 bundles) into a prefetch buffer (up to hold 8 bundles) Branch prediction is done using a multilevel adaptive predictor
10
Instruction delivery EXP and REN
Distributes up to 6 instructions to the 9 functional units Implements registers renaming for both rotation and register stacking
12
Operand delivery WLD and REG Accesses the register file
Performs register bypassing Accesses and updates a register scoreboard Checks predicate dependences
14
Execution EXE, DET and WRB
Executes instructions through ALUs and load/store units Detects exceptions and posts NaTs Retires instructions and performs write-back
17
Integer Performance SPECint benchmark: considerably slower
Itanium is considerably slower than Alpha and Pentium 4. Only: 60% of of P4, 68% of Alpha Itanium: HP rx4610, 800MHz, 4MB off-chip L3 cache Alpha 21264: Compaq GS320, 1GHz, on-chip L2 cache Pentium 4: Compaq Precision 330, 2GHz, 256KB on-chip L2 cache
18
Floating Point Performance SPECfp benchmarks: a different story
Itanium is quicker than Alpha and Pentium 4. 108% of of P4, 120% of Alpha Itanium: HP rx4610, 800MHz, 4MB off-chip, L3 cache Alpha 21264: Compaq GS320, 1GHz, on-chip L2 cache Pentium 4: Compaq Precision 330, 2GHz, on-chip L2 cache
19
Discussion on SPECfp Floating point app: competitive
.higher degrees of ILP .aggressive memory system Art benchmark: 4 times of Pentium 4 Alpha: outperform when tuned In terms of power: worse than P4 56% of floating point performance per watt
20
Summary By Us Good floating point performance Poor integer performance
Overall: not so good as Intel has advertised
21
Conclusion Large code size Only static instruction-level parallelism
Cannot manage cache misses/hits flexibly Lack of applications
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.