Download presentation
Presentation is loading. Please wait.
Published byNickolas Small Modified over 9 years ago
1
Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith Future Superscalar Processors Based on Instruction Compounding
2
Future Microprocessors 2 Instruction Compounding (Fusing) Instruction compounding, or “fusing” has become a key idea in high performance microprocessors “A compound instruction reflects the parallel issue of instructions; it comprises some number of independent instructions or interlocked instructions” “Instructions composing a compound instruction need not be consecutive.” -- S. Vassiliadis et al. IBM Journal of R and D, Jan. 1994
3
Future Microprocessors 3 The Future Processor: Three Key Aspects Instruction compounding or fusing Based on S. Vassiliadis work Employs compounding and 3-input ALU Co-designed VM for dynamic translation/fusing Concealed from all software Optimized (fused) instructions held in code-cache Dual decoder front-end for fast startup Hardware front-end decoder for fast startup Software translator for sustained high performance
4
Future Microprocessors 4 Processor Micro-architecture
5
Future Microprocessors 5 Fusible Instruction Set RISC-ops with unique features: A fusible bit per instruction fuses two dependent instructions Dense instruction encoding, 16/32-bit ISA design Special Features to Support the x86 ISA Condition codes Addressing modes Aware of long immediate & displacement values
6
Future Microprocessors 6 Microarchitecture: Macro-op Execution Enhanced OOO superscalar microarchitecture –Process & execute fused macro-ops as single Instructions throughout the entire pipeline
7
Future Microprocessors 7 Macro-op Fusing Algorithm Objectives: Maximize fused dependent pairs Simple & Fast Heuristics: Pipelined Scheduler: Only single-cycle ALU ops can be a head. Minimize non-fused single-cycle ALU ops Criticality: Fuse instructions that are “close” in the original sequence. ALU-ops criticality is easier to estimate. Simplicity: 2 or fewer distinct register operands per fused pair Solution: Two-pass Fusing Algorithm: The 1 st pass, forward scan, prioritizes ALU ops, i.e. for each ALU-op tail candidate, look backward in the scan for its head The 2 nd pass considers all kinds of RISC-ops as tail candidates
8
Future Microprocessors 8 Fusing Algorithm: Example x86 asm: ----------------------------------------------------------- 1. lea eax, DS:[edi + 01] 2. mov [DS:080b8658], eax 3. movzx ebx, SS:[ebp + ecx << 1] 4. and eax, 0000007f 5. mov edx, DS:[eax + esi << 0 + 0x7c] RISC-ops: ----------------------------------------------------- 1. ADDReax, Redi, 1 2. ST Reax, mem[R22] 3. LD.zx Rebx, mem[Rebp + Recx << 1] 4. ANDReax, 0000007f 5. ADDR17, Reax, Resi 6. LDRedx, mem[R17 + 0x7c] After fusing: Macro-ops ----------------------------------------------------- 1. ADDR18, Redi, 1 :: ANDReax, R18, 007f 2. ST R18, mem[R22] 3. LD.zx Rebx, mem[Rebp + Recx << 1] 4. ADD R17, Reax, Resi :: LDRebx, mem[R17+0x7c]
9
Future Microprocessors 9 Instruction Fusing Profile 55+% fused RISC-ops increases effective ILP by 1.4 Only 6% single-cycle ALU ops left un-fused.
10
Future Microprocessors 10 Other DBT Software Profile Of all fused macro-ops: 50% ALU-ALU pairs. 30% fused condition test & conditional branch pairs. Others mostly ALU-MEM ops pairs. Of all fused macro-ops: 70+% are inter-x86instruction fusion. 46% access two distinct source registers, only 15% (6% of all instruction entities) write two distinct destination registers. Translation Overhead Profile About 1000 instructions per translated hotspot instruction.
11
Future Microprocessors 11 Co-designed x86 Processor Performance
12
Future Microprocessors 12 Dual Decoder Front-End
13
Future Microprocessors 13 Evaluation: Startup Performance
14
Future Microprocessors 14 Activity of HW Assists
15
Future Microprocessors 15 Important Research Issues Profiling Probe insertion via software translator not feasible Multi-core Shared code cache SMT designs Memory consistency Stores can be done in-order Re-scheduled loads may be important for performance Precise traps Potential HW assist?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.