Download presentation
Presentation is loading. Please wait.
Published byἨλύσια Μοσχοβάκης Modified over 6 years ago
1
Emulating Unimplemented Instructions in an SMT
Suan Yong & Brian Forney CS/ECE 752 Spring 2000
2
Motivation Simultaneous Multithreaded processors are promising and likely to be embraced by industry Exploit thread level parallelism Compaq’s Alpha has planned SMT support
3
The question What if SMT support was needed but cost, complexity, or power consumption are an issue? One solution is to remove functional units => emulation Can anything be done to speed up emulation of instructions?
4
Related work “The Use of Multithreading for Exception Handling,” Zilles et al, Micro-32, November 1999 “Simultaneous Subordinate Microthreading (SSMT),” Chappell et al, 26th Annual ISCA, May 1999
5
Exception Handling standard SMT: pipeline: 3 3 4 5 6 7 4 5 6 6 7 7 A B
8 8 9 10
6
Simultaneous Multithreading (SMT) FETCH I$ DECODE D$ BRANCH PREDICTOR
PC R1 R2 Rn : PC 3 1 2 3 1 1 2 R1 DECODE R2 : 1 2 3 3 1 1 2 Rn Simultaneous Multithreading (SMT) FP D$ R / W I-ALU I-MUL 3 3 2 1 2 1 1 1
7
Emulating SMT approach
BRANCH PREDICTOR FETCH I$ PC R1 R2 Rn : PC R1 DECODE R2 : Rn FP I-MUL D$ R / W I-ALU I-MUL T-STRT T-RET
8
FETCH I$ DECODE D$ BRANCH PREDICTOR T-STRT R / W I-ALU I-MUL T-RET FP
& A PC R1 R2 Rn : PC A B R1 src1 src2 DECODE R2 : [3] 4 5 6 7 Rn FP I-MUL D$ R / W I-ALU T-STRT T-RET 5 3 4 5 6 7 [7] [6] [5] [4] [3] [2] [1]
9
FETCH I$ Z Z Z Z Z Z Z DECODE D$ BRANCH PREDICTOR T-STRT R / W I-ALU
PC R1 R2 Rn : PC A C B A R1 src1 DECODE R2 src2 : [3] 7 7 6 4 5 4 Rn FP I-MUL D$ R / W I-ALU T-STRT T-RET 5 5 3 4 5 6 7 [7] [6] [5] [4] [3] [2] [1]
10
FETCH I$ Z Z Z Z Z Z Z DECODE D$ BRANCH PREDICTOR T-STRT R / W I-ALU
PC R1 R2 Rn : PC R1 src1 DECODE R2 src2 : [3] 6 C Rn FP I-MUL D$ R / W I-ALU T-STRT T-RET C 3 5 6 7 4 C B A [7] [6] [5] [4] [3] [2] [1]
11
FETCH I$ Z Z Z Z Z Z Z DECODE D$ ? BRANCH PREDICTOR T-STRT R / W I-ALU
PC R1 R2 Rn : PC R1 src1 DECODE R2 src2 : [3] 6 C Rn FP I-MUL D$ R / W I-ALU T-STRT T-RET C 3 5 6 7 4 C B A [7] [6] [5] [4] [3] ? [2] [1]
12
Methodology Modified Zilles’s sim-multi Compaq Alpha SMT simulator
added exception thread support added multiply thread Ran representative execution traces of benchmarks from SPEC CPU2000 and MediaBench
13
Simulator modes
15
Conclusions ESMT usually minimizes performance cost of emulation
“ooo” mode (non-pausing) works best “squash” is occasionally better, because of resource contention do partial squashing? Some of the hardware is already needed, and could be useful for other purposes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.