Download presentation
Presentation is loading. Please wait.
Published byCharla Wilkins Modified over 8 years ago
1
Page 1 SARC Samsung Austin R&D Center SARC Maximizing Branch Behavior Coverage for a Limited Simulation Budget Maximilien Breughe 06/18/2016 Championship Branch Prediction (CBP-5) in conjunction with ISCA 2016
2
Page 2 SARC Samsung Austin R&D Center SARC Reduce thousands of workloads into small training set Training Set Suite C Suite A Suite B Thousands of workloads from popular benchmark suites Representative Set of 200 small (100M) + 23 big (1B) workloads
3
Page 3 SARC Samsung Austin R&D Center SARC Characterization through statistics Characterize Branch Behavior through 16 statistics PCA Principal Component Analysis to reduce correlation between the 16 statistics K-means clustering Classify workloads based on distance to neighbors Select representative workloads per cluster How to select workloads?
4
Page 4 SARC Samsung Austin R&D Center SARC Various metrics exist –E.g., MPKI, branch missrate, branch predictability, etc. –Branch Entropy [De Pestel et al., 2015] How to characterize branch behavior? Branch Address History Pattern # Not Taken # TakenEntropy 0xf2c48000000002000 0xf2c48001010120000 0xfde820001111100 1 0xfde820101010401600.5 Microarchitecture independent Predicts MPKI Always Taken for this pattern Never Taken for this pattern Half of the times taken 4x as much taken as not taken “For a given workload and n history bits, what is the overall complexity to predict a branch?”
5
Page 5 SARC Samsung Austin R&D Center SARC Branch Behavior Vector = (MPKI 1, MPKI 2, MPKI 3, IPC 1, IPC 2, IPC 3, MR, E L32, E L64, E G32, E G64, E T32, E T64, SBC, DBC, ILP) MPKI for three sizes of branch predictors IPC for three sizes of branch predictors Missrate for one branch predictor Local Branch Entropy (different sizes) Global Branch Entropy (different sizes) Tournament Branch Entropy (different sizes) Static and Dynamic Branch Count Instruction Level Parallelism Inversely proportional to misprediction penalty [Eyerman et al., 2006]
6
Page 6 SARC Samsung Austin R&D Center SARC = (MPKI 1, MPKI 2, MPKI 3, IPC 1, IPC 2, IPC 3, MR, E L32, E L64, E G32, E G64, E T32, E T64, SBC, DBC, ILP) Branch Behavior Space = (MPKI 1, MPKI 2, MPKI 3, IPC 1, IPC 2, IPC 3, MR, E L32, E L64, E G32, E G64, E T32, E T64, SBC, DBC, ILP) = (MPKI 1 (2), MPKI 2 (2), MPKI 3 (2), IPC 1 (2), IPC 2 (2), IPC 3 (2), MR(2), E L32 (2), E L64 (2), E G32 (2), E G64 (2), E T32 (2), E T64 (2), SBC(2), DBC(2), ILP(2)) = (MPKI 1 (i), MPKI 2 (i), MPKI 3 (i), IPC 1 (i), IPC 2 (i), IPC 3 (i), MR(i), E L32 (i), E L64 (i), E G32 (i), E G64 (i), E T32 (i), E T64 (i), SBC(i), DBC(i), ILP(i)) = (MPKI 1 (N), MPKI 2 (N), MPKI 3 (N), IPC 1 (N), IPC 2 (N), IPC 3 (N), MR(N), E L32 (N), E L64 (N), E G32 (N), E G64 (N), E T32 (N), E T64 (N), SBC(N), DBC(N), ILP(N)) … … One vector per workload N data points 16 dimensions, which are likely correlated
7
Page 7 SARC Samsung Austin R&D Center SARC Characterization through statistics Characterize Branch Behavior through 16 statistics PCA Principal Component Analysis to reduce correlation between the 16 statistics K-means clustering Classify workloads based on distance to neighbors Select representative workloads per cluster How to select workloads?
8
Page 8 SARC Samsung Austin R&D Center SARC Principal Component Analysis Example PC 1 PC 2 PC 1 captures 99% of the variance of the data We can remove PC 2 and reduce our space to 1 dimension without significant loss of information
9
Page 9 SARC Samsung Austin R&D Center SARC PCA reduces 16D space to 5D space Average MPKI – Average 63-bit entropy Projection of non-entropy but microarchitectural independent stats 92% of information captured by 5 first PC’s Graphical view of how the first two PC’s are composed The first two PC’s capture 65% of the variance Avg entropy + Avg MPKI – Avg IPC 2x (Avg IPC) + Avg local/tournament entropy
10
Page 10 SARC Samsung Austin R&D Center SARC Characterization through statistics Characterize Branch Behavior through 16 statistics PCA Principal Component Analysis to reduce correlation between the 16 statistics K-means clustering Classify workloads based on distance to neighbors Select representative workloads per cluster How to select workloads?
11
Page 11 SARC Samsung Austin R&D Center SARC K-means for K=12 and 2 dimensions Traditional K-means: Select data point closest to cluster center We set K=200 and use the 5 PC’s as dimensions Accuracy Improvement: Select Longest workload Closest to cluster center
12
Page 12 SARC Samsung Austin R&D Center SARC MPKI and IPC Prediction Results Traditional K- means Adjustment for workload size MethodMPKI ErrorIPC Error Traditional K-means< 2%< 1% Size Preferred< 1%< 0.1% Size Preferred increases accuracy
13
Page 13 SARC Samsung Austin R&D Center SARC Statistics Collection Speed (cf. Native Execution) 1 Functional simulation: 3 orders of magnitude slower (e.g., gem5: 3 MIPS) Local Branch Entropy (different sizes) Global Branch Entropy (different sizes) Tournament Branch Entropy (different sizes) Static and Dynamic Branch Count Instruction Level Parallelism MPKI for three sizes of branch predictors IPC for three sizes of branch predictors Missrate for one branch predictor 3 detailed simulations: 5 orders of magnitude slower (e.g., gem5: 40 KIPS) What if we removed Microarchitectural dependent statistics?
14
Page 14 SARC Samsung Austin R&D Center SARC PCA on Microarchitecture Independent Statistics Average entropy Projection of non- entropy stats f1(non-entropy stats) + global entropy – local entropy f2(non-entropy stats) – global entropy + local entropy 89.1% of information captured by 5 first PC’s
15
Page 15 SARC Samsung Austin R&D Center SARC MPKI and IPC Prediction Results Traditional K- means Adjustment for workload size Using only microarchitectural independent metrics MethodMPKI ErrorIPC ErrorSimulation Time cf. Native execution Traditional K- means < 2%< 1% x 3N x 10 5 Size Preferred< 1%< 0.1% x 3N x 10 5 Microarchitectural independent < 8.2%< 2.5% x N x 10 3 Accuracy vs Simulation overhead trade-off
16
Page 16 SARC Samsung Austin R&D Center SARC PCA and K-means clustering to reduce the amount of workloads Training set of 200 small workloads Evaluation set of 400 small workloads MPKI and IPC prediction with less than 1% and 0.1% error Statistics collection: 3 detailed simulations for all original workloads Microarchitectural independent statistics to reduce collection overhead Training set of 23 big workloads Evaluation set of 40 big workloads MPKI and IPC prediction with less than 8% and 2.5% error Statistics collection: 1 functional simulation for all original workloads Conclusion
17
Page 17 SARC Samsung Austin R&D Center SARC [De Pestel et al., 2015] Micro-Architecture Independent Branch Behavior Characterization, Sander De Pestel, Stijn Eyerman, and Lieven Eeckhout, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 135-144, March 2015 [Eyerman et al., 2006] Characterizing the Branch Misprediction Penalty, Stijn Eyerman, James E. Smith, and Lieven Eeckhout, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 48-58, March 2006 [Joshi et al., 2006] Measuring Program Similarity Using Inherent Program Characteristics, Ajay M. Joshi, Aashish Phansalkar, Lieven Eeckhout, and Lizy K. John, IEEE Transactions on Computers, Vol 55, No 6, pp. 769- 782 References
18
Page 18 SARC Samsung Austin R&D Center SARC Backup slides
19
Page 19 SARC Samsung Austin R&D Center SARC Various metrics exist –E.g., MPKI, branch missrate, branch predictability, etc. –Branch Entropy [De Pestel et al., 2015] How to characterize branch behavior? Branch Address History Pattern # Not Taken# Taken 0x000000020200 ………… ijN 0 (i,j)N 1 (i,j) ………… 0xFFFFFF111111 Prob[dir = taken| addr = i, pattern = j] n pattern bits Compute the weighted average entropy E Calculate entropy for all i and j E L (i, j) Branch Entropy with n history bits for this workload Microarchitecture independent Predicts MPKI
20
Page 20 SARC Samsung Austin R&D Center SARC Entropy calculation
21
Page 21 SARC Samsung Austin R&D Center SARC Reducing to 5 dimensions yields 92% of information 50.6+16.5+12.3 +7.2+5.7 = 92.3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.