Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

Locality / Tiling María Jesús Garzarán University of Illinois at Urbana-Champaign.
Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign.
Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
1 Parametric Sensitivity Analysis For Cancer Survival Models Using Large- Sample Normal Approximations To The Bayesian Posterior Distribution Gordon B.
Performance Analysis of Multiprocessor Architectures
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
1 Lecture 6 Performance Measurement and Improvement.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
The Power of Belady ’ s Algorithm in Register Allocation for Long Basic Blocks Jia Guo, María Jesús Garzarán and David Padua jiaguo, garzaran,
Bayesian Learning Rong Jin.
Computer vision: models, learning and inference
An Experimental Comparison of Empirical and Model-based Optimization Keshav Pingali Cornell University Joint work with: Kamen Yotov 2,Xiaoming Li 1, Gang.
University of Maryland Automatically Adapting Sampling Rates to Minimize Overhead Geoff Stoker.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Priors, Normal Models, Computing Posteriors
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
Independent Component Analysis (ICA) A parallel approach.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
August th Computer Olympiad1 Learning Opponent-type Probabilities for PrOM search Jeroen Donkers IKAT Universiteit Maastricht.
Analysis of Algorithms CSCI Previous Evaluations of Programs Correctness – does the algorithm do what it is supposed to do? Generality – does it.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
On Predictive Modeling for Claim Severity Paper in Spring 2005 CAS Forum Glenn Meyers ISO Innovative Analytics Predictive Modeling Seminar September 19,
 Genetic Algorithms  A class of evolutionary algorithms  Efficiently solves optimization tasks  Potential Applications in many fields  Challenges.
An Experimental Comparison of Empirical and Model-based Optimization Kamen Yotov Cornell University Joint work with: Xiaoming Li 1, Gang Ren 1, Michael.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Apan.
Lecture 2: Statistical learning primer for biologists
Automatic Parameterisation of Parallel Linear Algebra Routines Domingo Giménez Javier Cuenca José González University of Murcia SPAIN Algèbre Linéaire.
Adaptive Sorting “A Dynamically Tuned Sorting Library” “Optimizing Sorting with Genetic Algorithms” By Xiaoming Li, Maria Jesus Garzaran, and David Padua.
Compilers as Collaborators and Competitors of High-Level Specification Systems David Padua University of Illinois at Urbana-Champaign.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
Machine Learning 5. Parametric Methods.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
Stochastic Loss Reserving with the Collective Risk Model Glenn Meyers ISO Innovative Analytics Casualty Loss Reserving Seminar September 18, 2008.
Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
Sunpyo Hong, Hyesoon Kim
Empirical Optimization. Context: HPC software Traditional approach  Hand-optimized code: (e.g.) BLAS  Problem: tedious to write by hand Alternatives:
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
1
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Estimating the reproducibility of psychological science: accounting for the statistical significance of the original study Robbie C. M. van Aert & Marcel.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Approximate Models for Fast and Accurate Epipolar Geometry Estimation
for more information ... Performance Tuning
Memory Hierarchies.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Horizontally Partitioned Hybrid Main Memory with PCM
Parametric Methods Berlin Chen, 2005 References:
What does it take to produce near-peak Matrix-Matrix Multiply
Presentation transcript:

Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1, K. Yotov 2, K. Pingali 2 1 University of Illinois at Urbana-Champaign 2 Cornell University

Two approaches to code optimization: Models –E.g., calculate the best tile size for MM as a function of cache size. –Fast –May be inaccurate –No verification through feedback Empirical Search –E.g., execute and measure different versions of MM code with different tile sizes. –Slow –Accurate because of feedback

Hybrid Approach Faster than empirical search More accurate than the model –Use the model as a prior –Use active sampling to minimize the amount of searching

Why is Speed Important? Adaptation may have to be applied at runtime, where running time is critical. Adaptation may have to be applied at compile time (e.g., with feedback from a fast simulator) Library routines can be used as a benchmark to evaluate alternative machine designs.

Problem: Matrix Multiplication Tiling –Improves the locality of references Cache Blocking (NB): Matrix is decomposed into smaller subblocks of size NBxNB Matrix multiplication - illustrative example for testing the hybrid approach Ultimate goal: a learning compiler that specializes itself to its installation environment, user profile, etc.

Empirical Search: ATLAS Try tiling parameters NB in the range in steps of 4

Model (Yotov et. al.) Compute NB which optimizes the use of the L1 cache. Constructed by analyzing the memory access trace of the matrix multiplication code. Formula: Has been extended to optimize the use of the L2 cache

Model in action: Performance curve: Vertical lines: model-predicted L1 and L2 blocking factors Whether to tile for the L1 or the L2 cache depends on the architecture and the application

Hybrid approach Model performance with a family of regression curves Regression (nonparam) – minimizing the average error Regression (ML) –Distribution over regression curves –Pick the most likely curve

Regression (Bayesian) Prior distribution  curve) over regression curves –Make regression curves with model-predicted maxima more likely Posterior distribution given the data (Bayes rule): –P(curve|data)=P(data|curve)  (curve)/P(data) Pick the maximum a-posteriori curve –Picks curves with peaks in model-predicted locations when the data sample is small –Picks curves which fit the data best when the sample is large

Active sampling Objectives: 1)Sample at lower-tile sizes – takes less time 2)Explore – don’t oversample in the same region 3)Get information about the dominant peak

Solution: Potential Fields objectives 1,2 Positive charge at the origin Negative charges at previously sampled points Sample at the point which minimizes the field

Potential Fields objective 3 Positive charge in the region of the dominant peak How do we know which peak dominates: –Distribution over regression curves can compute: P(peak1 is located at x), P(peak2 is located at x), P(peak1 is of height h), P(peak2 is of height h) Hence, can compute P(peak1 dominates peak2) Impose a positive charge in the region of each peak proportional to its probability of domination

Results I – Regression Curves

Results II – Time, Performance ModelHybridATLAS Sparc SGI ModelHybridATLAS Sparc0:003:128:59 SGI0:0014:0259:00 Performance (MFLOPS) Time (mins) Sparc – actual improvement due to the hybrid search for NB: ~10% SGI – improvement over both the model and ATLAS due to choosing to tile for the L2 cache

Results III – Library Performance

Conclusion Approach: incorporates the prior. Active sampling: actively picks to sample in the most informative region. Decreases the search time of the empirical search, improves on the model’s performance.