Download presentation
Presentation is loading. Please wait.
1
Dynamically Sizing the TAGE Branch Predictor
Stephen Pruett, Siavash Zangeneh, Ali Fakhrzadehgan, Ben Lin, and Yale N. Patt 6/18/16 HPS Research Group, The University of Texas at Austin
2
HPS Research Group, The University of Texas at Austin
Problem Storage efficiency is key (Fundamental tradeoff with conditional probability) Size of TAGE tables must be decided at design time Number of tables (and histories) must be decided at design time Most benchmarks require large amounts of storage in low tables Makes it impossible for designers to justify long (expensive) histories or adding storage to longer history tables Designers must consider what is best for all benchmarks, not what is best for each benchmark 6/18/16 HPS Research Group, The University of Texas at Austin
3
HPS Research Group, The University of Texas at Austin
Our Contributions A reconfigurable architecture that can reallocate storage at run time Algorithms that determine storage that the running application needs Victim Cache 6/18/16 HPS Research Group, The University of Texas at Austin
4
HPS Research Group, The University of Texas at Austin
Outline Architecture Tables, Tiles, and Configuration Vectors Scoring Unit Reconfigurable Interconnect Victim Cache Limitations Results Questions 6/18/16 HPS Research Group, The University of Texas at Austin
5
HPS Research Group, The University of Texas at Austin
Outline Architecture Tables, Tiles, and Configuration Vectors Scoring Unit Reconfigurable Interconnect Victim Cache Limitations Results Questions 6/18/16 HPS Research Group, The University of Texas at Austin
6
HPS Research Group, The University of Texas at Austin
Architecture M geometrically increasing history registers, as in TAGE Cascading MUXes, as in TAGE N tiles instead of tables 2 Reconfigurable Interconnects Scoring Unit Collects run time information Determines the size of each table Configuration Vector Specified by the scoring unit Input into the reconfigurable interconnect E.g.: 2,0,0,0,0,0,0,0,4,0,2,0,4,4,8,8 6/18/16 HPS Research Group, The University of Texas at Austin
7
One Possible Configuration
# of Histories # of Tiles Geometric Series Configuration Vector 6 16 2n 4,4,2,2,2,2 h[1:21] h[1:22] h[1:23] h[1:24] h[1:25] h[1:26] H H H H H H Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE 6/18/16 HPS Research Group, The University of Texas at Austin
8
Another Possible Configuration
# of Histories # of Tiles Geometric Series Configuration Vector 6 16 2n 1,1,2,4,4,4 h[1:21] h[1:22] h[1:23] h[1:24] h[1:25] h[1:26] H H H H H H Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE 6/18/16 HPS Research Group, The University of Texas at Austin
9
HPS Research Group, The University of Texas at Austin
And Another… # of Histories # of Tiles Geometric Series Configuration Vector 6 16 2n 4,8,2,2,0,0 h[1:21] h[1:22] h[1:23] h[1:24] h[1:25] h[1:26] H H H H H H TILE TILE Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE TILE 6/18/16 HPS Research Group, The University of Texas at Austin
10
Quanta and Runtime Phases
Adaptive Phase Beginning End The currently running program Quantum Learning Phase Time 6/18/16 HPS Research Group, The University of Texas at Austin
11
HPS Research Group, The University of Texas at Austin
Scoring Unit Learning Phase Collects run time information Produces new configuration vectors Adaptive Phase Selects the best configuration Dynamically switches between the configurations produced in the learning phase 6/18/16 HPS Research Group, The University of Texas at Austin
12
Scoring Unit: Learning Phase
Runtime Statistics Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Mispredictions Mispredictions Mispredictions Mispredictions Mispredictions Mispredictions Conflicts Conflicts Conflicts Conflicts Conflicts Conflicts Attempts Attempts Attempts Attempts Attempts Attempts From the predictor 6/18/16 HPS Research Group, The University of Texas at Austin
13
Scoring Unit: Learning Phase (cont.)
Misprediction Counter incremented when table mispredicts Conflict Counter Incremented when an attempted allocation conflicts (cannot be allocated) in the table Attempt (Attempted Allocation) Counter Increments when there is an attempted allocation 6/18/16 HPS Research Group, The University of Texas at Austin
14
Scoring Unit: Learning Phase (cont..)
Runtime Statistics Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Mispredictions Mispredictions 2 4 Mispredictions 1 3 3 Mispredictions Mispredictions Mispredictions Conflicts Conflicts Conflicts 2 Conflicts 1 Conflicts Conflicts Attempts Attempts Attempts 2 1 Attempts 3 4 1 2 Attempts Attempts From the predictor 6/18/16 HPS Research Group, The University of Texas at Austin
15
When does a table need more storage?
Highly congested tables usually have many conflicts Symptom: high number of conflicts Some tables are so highly congested that new allocations overwrite entries before they are ever used Symptom: high number of attempted allocations Difficult to tell difference between this and a useless table 6/18/16 HPS Research Group, The University of Texas at Austin
16
HPS Research Group, The University of Texas at Austin
Algorithms Step 1: Reclaim storage Reduce table to smallest power of 2 that can still hold all the entries Very aggressive strategy Step 2: Distribute storage (3 algorithms) Conflict Add storage to tables that have an above average number of conflicts Attempt Add storage to tables that have an above average number of attempted allocations Additionally, give higher priority to tables that are after the max table Hybrid Use the conflict policy if the MPKI is high, otherwise use the attempt policy 6/18/16 HPS Research Group, The University of Texas at Austin
17
Scoring Unit: Adaptive Phase
Misprediction Vector Config 0 Config 1 Config 2 Config 3 Config 4 Config 5 Config 6 Total Mispredicts Total Mispredicts Total Mispredicts Total Mispredicts Total Mispredicts Total Mispredicts Total Mispredicts From the predictor 6/18/16 HPS Research Group, The University of Texas at Austin
18
HPS Research Group, The University of Texas at Austin
Scoring Unit: Adaptive Phase Adopted from T. Juan, S. Sanjeevan, and J. Navarro, “Dynamic History Length Fitting: A Third Level of Adaptivity for Branch Prediction Misprediction Vector Learning Phase Adaptive Phase Config 0 Config 1 Config 2 Config 3 Config 4 Config 5 Config 6 Active Active Active Active Active Active Active 527 Total Mispredicts 415 Total Mispredicts Total Mispredicts 675 897 Total Mispredicts 922 Total Mispredicts 342 417 Total Mispredicts 672 Total Mispredicts Minimum Minimum From the predictor 6/18/16 HPS Research Group, The University of Texas at Austin
19
Reconfigurable Interconnect
2 Reconfigurable Interconnects Connects histories to tiles Connects tiles to MUXes Each X is a switch, enabled by the CV Tiles organized in a direct mapped fashion Fully-associative would limit the max size of a table Hashing function always produces enough bits to index largest possible table Nice because low bits of hash (used to index tile) do not change after remapping. High bits are compared to TileID 6/18/16 HPS Research Group, The University of Texas at Austin
20
HPS Research Group, The University of Texas at Austin
Victim Cache Boosts the most heavily loaded table Goal: Increase the reuse distance for entries that were never used. I.e., the minimum # of conflict with an above average # of attempts If there were too many conflicts would not be able to restore entry Take advantage of unused bit combinations in each entry Organized as a bloom filter Trade off capacity for correctness 6/18/16 HPS Research Group, The University of Texas at Austin
21
HPS Research Group, The University of Texas at Austin
Limitations Number of tiles must be a power of 2 Otherwise it is possible to create invalid combinations Simplifies logic Problem gets worse as overall predictor size increases Tag size Assume worst case, use 15 bit tags Attempted treating as two entries in lower tables 6/18/16 HPS Research Group, The University of Texas at Austin
22
HPS Research Group, The University of Texas at Austin
Outline Architecture Tables, Tiles, and Configuration Vectors Scoring Unit Reconfigurable Interconnect Victim Cache Limitations Results Questions 6/18/16 HPS Research Group, The University of Texas at Austin
23
HPS Research Group, The University of Texas at Austin
Configuration Parameter 8KB 64KB # of Tiles (N) 32 64 Size of Tile 512 Tag Size 15 10 Quantum 100K 50K Quanta in Learning Phase 7 6/18/16 HPS Research Group, The University of Texas at Austin
24
HPS Research Group, The University of Texas at Austin
Results: 8KB Average MPKI: 5.370 Trace Diff Improv Configuration Vector SS53 2.32 6.59% 1,8,8,4,4,4,1,1,1,0,0,0,0,0,0,0 SS56 2.49 6.17% SS57 2.61 7.57% 1,1,1,1,1,1,4,8,4,4,4,1,1,0,0,0 SM2 2.88 65.58% SM43 5.07 435.97% 0,0,0,0,1,1,2,1,1,8,4,4,4,4,1,1 6/18/16 HPS Research Group, The University of Texas at Austin
25
HPS Research Group, The University of Texas at Austin
Results: 64KB Average MPKI: 4.265 Trace Diff Improv Configuration Vector SS57 1.10 3.85% 4,4,4,8,8,8,4,4,4,4,2,2,2,2,2,2 SS53 1.22 4.26% 4,4,8,8,8,8,4,4,2,2,2,2,2,2,2,2 SM42 1.71 151.15% 2,1,1,4,4,4,8,8,8,8,4,4,2,2,2,2 SM41 1.99 15.35% 1,1,1,1,4,4,4,4,8,8,8,4,4,4,4,4 SM58 2.45 25.52% 2,2,1,1,1,1,4,8,8,8,8,4,4,4,4,4 6/18/16 HPS Research Group, The University of Texas at Austin
26
HPS Research Group, The University of Texas at Austin
Questions? Thank you! 6/18/16 HPS Research Group, The University of Texas at Austin
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.