Raced Profiles:Efficient Selection of Competing Compiler Optimizations Hugh Leather, Bruce Worton, Michael O'Boyle Institute for Computing Systems Architecture.

Slides:



Advertisements
Similar presentations
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Advertisements

Chapter 10 Section 2 Hypothesis Tests for a Population Mean
1 1 The Scientist Game Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Chapter 2 Simple Comparative Experiments
Experimental Evaluation
BCOR 1020 Business Statistics
Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?
Bootstrapping applied to t-tests
Based on a sample x 1, x 2, …, x 12 of 12 values from a population that is presumed normal, Genevieve tested H 0 :  = 20 versus H 1 :   20 at the 5%
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
 We cannot use a two-sample t-test for paired data because paired data come from samples that are not independently chosen. If we know the data are paired,
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Eleven Stephen Nunn.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
Stat 31, Section 1, Last Time Distribution of Sample Means –Expected Value  same –Variance  less, Law of Averages, I –Dist’n  Normal, Law of Averages,
10.1 Estimating with Confidence Chapter 10 Introduction to Inference.
Grouping and Segmentation. Sometimes edge detectors find the boundary pretty well.
Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:
Machine Learning, Compilers and Mobile Systems Institute for Computing Systems Architecture University of Edinburgh, UK Hugh Leather.
Automatic Feature Generation for Machine Learning Based Optimizing Compilation Hugh Leather, Edwin Bonilla, Michael O'Boyle Institute for Computing Systems.
MILEPOST Machine learning in compilers: The Future of Optimisation Hugh Leather University of Edinburgh.
Module 10 Hypothesis Tests for One Population Mean
Step 1: Specify a null hypothesis
Chapter 8: Estimating with Confidence
Confidence Intervals for Proportions
Comparing Systems Using Sample Data
User Interface Evaluation
More on Inference.
Experiments, Simulations Confidence Intervals
Chapter 8: Estimating with Confidence
Statistics fundamentals
CHAPTER 9 Testing a Claim
Inference for Two-Samples
Analysis of Algorithms
Target for Today Know what can go wrong with a survey and simulation
Inferential Statistics Inferences from Two Samples
Unit 5: Hypothesis Testing
How confidence intervals work
Thought for the Day “Years wrinkle the skin, but to give up enthusiasm wrinkles the soul.” – Douglas MacArthur.
Confidence Intervals for Proportions
Longitudinal Designs.
Johanna Rothman Report Your Project State Chapter 14
Hypothesis Tests for a Population Mean in Practice
More on Inference.
CHAPTER 9 Testing a Claim
Discrete Event Simulation - 4
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Four-Cut: An Approximate Sampling Procedure for Election Audits
Significance Tests: The Basics
Significance Tests: The Basics
Debuggers and Debugging
Psych 231: Research Methods in Psychology
Chapter 8: Estimating with Confidence
Psych 231: Research Methods in Psychology
Multi-Core Programming Assignment
Chapter 8: Estimating with Confidence
Confidence Intervals for Proportions
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
CHAPTER 9 Testing a Claim
Carrying Out Significance Tests
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Confidence Intervals for Proportions
Section 8.2 Day 2.
Chapter 8 Inferences from Two Samples
Confidence Intervals Usually set at 95 % What does that mean ?
Presentation transcript:

Raced Profiles:Efficient Selection of Competing Compiler Optimizations Hugh Leather, Bruce Worton, Michael O'Boyle Institute for Computing Systems Architecture University of Edinburgh, UK

Overview The problem Profile Races Results Conclusion

Overview The problem Profile Races Results Conclusion

The problem Iterative compilation finds best way to compile a program Very expensive ( Compare 10s or 1000s of program versions) Machine learning an extreme case Must ensure clean data Run versions many times to overcome noise Risks Too few times → bad data Too many times → wasted effort How do we run just enough times?

The problem Computers are non deterministic and noisy Other processes OS interaction CPU temperature Cannot tell ahead of time how noisy a program will be

The problem

μ

μ

Pretty sure the true mean is somewhere in here

The problem Verdict – Version 'c' is best Runtime

The problem Verdict – Umm, help? Runtime

The problem True means might look quite different Runtime

The problem Many already 'worse' but no clear winner

The problem Unroll factor 8 wins

The problem Does this happen in practice? Find best unroll factor [0-16] for each loop See how often small samples choose different unroll factor compared to big sample (size=1000) Consider a failure if small sample choice is >0.5% worse than big sample choice

The problem Sometimes the number of samples has to be huge

Overview The problem Profile Races Results Conclusion

Profile Races Simple adaptive method Run each program version until Some other one provably better Remaining program versions equivalent

Profile Races Small samples of each 2143 n-2n-2...n n-1n-1 Compilation

Profile Races Small samples of each All equal? yes - stop 2143 n-2n-2...n n-1n-1 Compilation

Equality testing Statistical tests only say two means are different Equality testing says two means are the same Researcher defines an 'indifference region' 0 +θ -θ

Equality testing If two confidence intervals are completely inside the indifference region they are sufficiently equal = 0 +θ -θ

Equality testing If either is outside then there is not enough information yet Two parameters: θ - indifference region α EQ - confidence level =≠? 0 +θ -θ

Profile Races Small samples of each All equal? yes – stop Remove losers 2143 n-2n-2...n n-1n-1 Compilation

Removing Losers Confidence intervals visualise statistical significance Non overlapping → significant Mean of A is lower than mean of B AB

Removing Losers Confidence intervals visualise statistical significance Non overlapping → significant Mean of A is lower than mean of B Overlapping → NOT significant Can say nothing about means of A,B AB

Removing Losers Confidence intervals visualise statistical significance Non overlapping → significant Mean of A is lower than mean of B Overlapping → NOT significant Can say nothing about means of A,B Significance tests formalise this e.g. Student's t-test Parameter α LT – confidence level AB

Profile Races Small samples of each All equal? yes – stop Remove losers Increase survivor sample sizes 2143 n-2n-2...n n-1n-1 Compilation

Profile Races Small samples of each All equal? yes – stop Remove less than any other Increase survivor sample sizes Repeat 2143 n-2n-2...n n-1n-1 Compilation

Overview The problem Profile Races Results Conclusion

Results – Unrolling Loop unrolling Find best unroll factor [0-16] for each loop 22 benchmarks from UTDSP and MediaBench Core duo, 2.8GHz, 2Gb RAM Unloaded, headless Cycle counts from 1000 runs

Results – Unrolling - Easy case Low noise Most loops clearly worse Effort only on possible winners

Results – Unrolling - Hard case High noise No clear winners More samples to combat noise

Results – Unrolling - Comparison Compare against Constant sampling plan JavaSTATS Run each program version until ratio of CI : mean is sufficiently small Each version considered independently Losers not weeded out

Results – Unrolling - Comparison Profile races are an order of magnitude better

Results – Compiler flags Compiler flags Find best compiler flags for each benchmark 57 benchmarks from UTDSP, MediaBench, MiBench Core duo, 2.8GHz, 2Gb RAM Unloaded, headless Cycle counts from 100 runs

Results – Compiler flags Profile races are an order of magnitude better

Overview The problem Profile Races Results Conclusion

Profile races Produce statistically sound data Reduce cost of iterative compilation (~10x) Easy to select parameters

Results – Parameter contours

Confidence intervals Statisticians have measures for sample quality

Confidence intervals Confidence interval is a region where the true mean is likely to be Symmetric around sample mean Confidence level says how sure we are true mean is in the region 95% CI

Confidence intervals As you require less certainty, the interval shrinks 10% CI

Confidence intervals Complete certainty gives an infinite interval 100% CI -∞ ∞+∞+

Confidence intervals More data means we can be more sure where the true mean is The interval shrinks 95% CI