Offline Adaptation Using Automatically Generated Heuristics Frédéric de Mesmay, Yevgen Voronenko, and Markus Püschel Department of Electrical and Computer.

Slides:



Advertisements
Similar presentations
DFT & FFT Computation.
Advertisements

Statistical Modeling of Feedback Data in an Automatic Tuning System Richard Vuduc, James Demmel (U.C. Berkeley, EECS) Jeff.
Acceleration of Cooley-Tukey algorithm using Maxeler machine
David Hansen and James Michelussi
CS 4701 – Practicum in Artificial Intelligence Pre-proposal Presentation TEAM SKYNET: Brian Nader Stephen Stinson Rei Suzuki.
Parallel Processing (CS 730) Lecture 7: Shared Memory FFTs*
Carnegie Mellon Automatic Generation of Vectorized Fast Fourier Transform Libraries for the Larrabee and AVX Instruction Set Extension Automatic Generation.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 123 “True” Constrained Minimization.
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
Physics 434 Module 4-FFT - T. Burnett 1 Physics 434 Module 4 week 2: the FFT Explore Fourier Analysis and the FFT.
A Fast Fourier Transform Compiler Silvio D Carnevali.
Automated Changes of Problem Representation Eugene Fink LTI Retreat 2007.
Lecture #18 FAST FOURIER TRANSFORM INVERSES AND ALTERNATE IMPLEMENTATIONS Department of Electrical and Computer Engineering Carnegie Mellon University.
Burton D. Morgan Entrepreneurial Competition Are you the entrepreneurial type? Do you want to start your own business and be your own boss? Do you have.
Carnegie Mellon Adaptive Mapping of Linear DSP Algorithms to Fixed-Point Arithmetic Lawrence J. Chang Inpyo Hong Yevgen Voronenko Markus Püschel Department.
Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
Unit 1. Sorting and Divide and Conquer. Lecture 1 Introduction to Algorithm and Sorting.
Instructor: Dr. Sahar Shabanah Fall Lectures ST, 9:30 pm-11:00 pm Text book: M. T. Goodrich and R. Tamassia, “Data Structures and Algorithms in.
Numerical Analysis – Digital Signal Processing Hanyang University Jong-Il Park.
The Group Runtime Optimization for High-Performance Computing An Install-Time System for Automatic Generation of Optimized Parallel Sorting Algorithms.
A Prototypical Self-Optimizing Package for Parallel Implementation of Fast Signal Transforms Kang Chen and Jeremy Johnson Department of Mathematics and.
Carnegie Mellon SPIRAL: An Overview José Moura (CMU) Jeremy Johnson (Drexel) Robert Johnson (MathStar) David Padua (UIUC) Viktor Prasanna (USC) Markus.
SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson 2 Robert Johnson 3, David Padua 1 1 Computer Science, University of Illinois.
Short Vector SIMD Code Generation for DSP Algorithms
7 th Annual Workshop on Charm++ and its Applications ParTopS: Compact Topological Framework for Parallel Fragmentation Simulations Rodrigo Espinha 1 Waldemar.
High Performance Linear Transform Program Generation for the Cell BE
Lecture 22 MA471 Fall Advection Equation Recall the 2D advection equation: We will use a Runge-Kutta time integrator and spectral representation.
1 Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF.
1 Summary of lectures 1.Introduction to Algorithm Analysis and Design (Chapter 1-3). Lecture SlidesLecture Slides 2.Recurrence and Master Theorem (Chapter.
Implementation of Fast Fourier Transform on General Purpose Computers Tianxiang Yang.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Automatic Performance Tuning Jeremy Johnson Dept. of Computer Science Drexel University.
FFT: Accelerator Project Rohit Prakash Anand Silodia.
Carnegie Mellon Generating High-Performance General Size Linear Transform Libraries Using Spiral Yevgen Voronenko Franz Franchetti Frédéric de Mesmay Markus.
Optimizing Sorting With Genetic Algorithms Xiaoming Li, María Jesús Garzarán, and David Padua University of Illinois at Urbana-Champaign.
PERFORMANCE ANALYSIS cont. End-to-End Speedup  Execution time includes communication costs between FPGA and host machine  FPGA consistently outperforms.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
MATLAB
Spiral: an empirical search system for program generation and optimization David Padua Department of Computer Science University of Illinois at Urbana-
The Prime Bid Presentation 18 tabs = bid packages Roofing is selected. Roofing has 8 sections and 11 bidders + yourself. 2 subs are excluded. 4 subs.
Distributed WHT Algorithms Kang Chen Jeremy Johnson Computer Science Drexel University Franz Franchetti Electrical and Computer Engineering.
R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
Carnegie Mellon High-Performance Code Generation for FIR Filters and the Discrete Wavelet Transform Using SPIRAL Aca Gačić Markus Püschel José M. F. Moura.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Compilers as Collaborators and Competitors of High-Level Specification Systems David Padua University of Illinois at Urbana-Champaign.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
Carnegie Mellon Program Generation with Spiral: Beyond Transforms This work was supported by DARPA DESA program, NSF-NGS/ITR, NSF-ACR, Mercury Inc., and.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Application of machine learning to RCF decision procedures Zongyan Huang.
CES 592 Theory of Software Systems B. Ravikumar (Ravi) Office: 124 Darwin Hall.
Performance Analysis of Divide and Conquer Algorithms for the WHT Jeremy Johnson Mihai Furis, Pawel Hitczenko, Hung-Jen Huang Dept. of Computer Science.
Reconfigurable acceleration of robust frequency-domain echo cancellation C. H. Ho 1, K.F.C.Yiu 2, J. Huo 3, S. Nordholm 3 and W. Luk 1 1.Department of.
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
In Search of the Optimal WHT Algorithm J. R. Johnson Drexel University Markus Püschel CMU
بسم الله الرحمن الرحيم Digital Signal Processing Lecture 14 FFT-Radix-2 Decimation in Frequency And Radix -4 Algorithm University of Khartoum Department.
Fang Fang James C. Hoe Markus Püschel Smarahara Misra
Boosting and Additive Trees (2)
Polynomial + Fast Fourier Transform
Automatic Performance Tuning
Enhancing Diagnostic Quality of ECG in Mobile Environment
Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico
A Parallel Fast Fourier Transform for Millimeter-wave Applications
 = N  N matrix multiplication N = 3 matrix N = 3 matrix N = 3 matrix
Feature Selection Methods
Lecture #18 FAST FOURIER TRANSFORM ALTERNATE IMPLEMENTATIONS
Presentation transcript:

Offline Adaptation Using Automatically Generated Heuristics Frédéric de Mesmay, Yevgen Voronenko, and Markus Püschel Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA Marius Fehr, December 7 th

Online vs. Offline Adaptive Libraries 2 Online Adaptive Offline Adaptive Machine Learning d = dft(n) Search generated at Installation d(X,Y) – possibly unbounded initialization time – impractical for constantly changing problem specifications + bounded initialization time + no search if input specifications change

3 Motivation Background Adaptation Process Results

Statistical Classifier – C4.5 4 Source: Offline Adaptation Using Automatically Generated Heuristics, CMU features ˃Based on entropy, the measure of uncertainty ˃Feature with smallest entropy becomes root

DFT – Discrete Fourier Transforms ˃The Fourier Transform is a Linear Transform: 5 X X Y Y DFT n = *

FFT – Fast Fourier Transforms ˃Divide and Conquer Algorithms 6 Source: Lecture Slides :“How to Write Fast Numerical Code “ (ETH, CS), Markus Püschel

FFTW – Search space 7 dft ( 128 ) These decisions can be optimized! Radix 8 dft_strided (16, 8) dft_scaled (8, 16) Radix 4 No ˃Features :Problem size and stride ˃Decisions:Use of base cases and choice of radix Source: Offline Adaptation Using Automatically Generated Heuristics, CMU

FFTW – Advanced Implementations 8 Lots of decisions! Source: Offline Adaptation Using Automatically Generated Heuristics, CMU

9 Motivation Background Adaptation Process Results

Offline Adaptation Process 10 C4.5 Search x x x x x x x x x x x x x x Heuristic featuresdecisions sizestrideuse base caseradix 128-no8 16-no4 4-yes- 168no4 ………… Trainings Set

Offline Adaptation Process 11 Exploration Statistical Classification Verification Combination Online Adaptive Library Offline Adaptive Library Source: Offline Adaptation Using Automatically Generated Heuristics, CMU

Exploration ˃“Creates a table for the statistical classifier to work with” 12 Exploration Statistical Classification Verification Combination Search x x x x x x x x x x x x x x featuresdecisions SizeStrideUse Base caseRadix 128-No8 16-No4 4-Yes- 168No4 ………… Trainings Set

Statistical Classification ˃Computes decision trees ˃Uses a modified version of C4.5 ˃Provides “hints” based on library functionality 13 Exploration Statistical Classification Verification Combination Source: Offline Adaptation Using Automatically Generated Heuristics, CMU

Hinting ˃Problem:Sometimes C4.5 has not enough information to make wise decisions ˃Choice of radix: ˃Performance depends strongly on the prime factorization ˃Multiple of 2 and 3 show very different behavior but are heavily interleaved ˃Solution:Provide additional features to C Exploration Statistical Classification Verification Combination n n < … … … 58 > 59 Radix ?

Hinting 15 Exploration Statistical Classification Verification Combination Radix 12 Radix 3 Radix 18 Radix 6 Radix 4 nfactor( 2, n ) nfactor( 3, n ) n

16 Exploration Statistical Classification Verification Combination Source: Offline Adaptation Using Automatically Generated Heuristics, CMU

Verification 17 Exploration Statistical Classification Verification Combination 24 = 2 * 2 * 2* 3 Source: Offline Adaptation Using Automatically Generated Heuristics, CMU

Combination ˃Inserts decision trees into the library as heuristics 18 Exploration Statistical Classification Verification Combination Online Adaptive d = dft(n) Search d(X,Y) Offline Adaptive

19 Motivation Background Adaptation Process Results

2-Powers – Competitiveness 20 Platform: 2 x dual core 3GHz Intel Xeon 5160 Source: Offline Adaptation Using Automatically Generated Heuristics, CMU

Learning and Generating Heuristics 21 Source: Offline Adaptation Using Automatically Generated Heuristics, CMU

Mixed Sizes - Competitiveness 22 Source: Offline Adaptation Using Automatically Generated Heuristics, CMU

˃Pro ˃Improvement of usability ˃Only small performance penalty (for DFT) ˃Entirely automatic method ˃Applicable to other problem domains / libraries ˃Computer generation of offline adaptive library, directly from algorithm specification (together with Spiral) ˃Contra ˃Performance depends strongly on choice of training set Pro vs. Contra Algorithm Specification SPIRAL this paper Adaptive Offline Library 23

Questions ? 24

Questions ? 25

Online vs. Offline Adaptive Libraries 3 Library TypeNon – adaptiveOnline adaptiveOffline adaptive PrototypeIPPFFTWthis paper Interface Initialization cost Computation cost Adaptation mechanism -online (planer at runtime) offline (at installation time) problem changes-rerun planer- platform changesrebuyrerun planerreinstall 26 Source: Offline Adaptation Using Automatically Generated Heuristics, CMU

Math behind C4.5 27