Presented by: Divya Muppaneni

Slides:

Advertisements

Similar presentations

1 Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping Chi-Keung (CK) Luk Technology Pathfinding and Innovation Software.

Advertisements

UPC Compiler Support for Trace-Level Speculative Multithreaded Architectures Antonio González λ,ф Carlos Molina ψ Jordi Tubella ф INTERACT-9, San Francisco.

Pattern Recognition and Machine Learning

Original Development Team The Compiler and Architecture Research Group (formerly part of Hewlett-Packard Laboratories) Illinois Microarchitecture Project.

ISLPED 2003 Power Efficient Comparators for Long Arguments in Superscalar Processors *supported in part by DARPA through the PAC-C program and NSF Dmitry.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Mitigating the Compiler Optimization Phase- Ordering Problem using Machine Learning.

Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.

UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,

Optimizing General Compiler Optimization M. Haneda, P.M.W. Knijnenburg, and H.A.G. Wijshoff.

Multiscale, Multigranular Land Cover Classification: Performance Optimization Vijay Gandhi, Abhinaya Sinha 1 st May, 2006.

Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.

1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.

Compilation Techniques for Energy Reduction in Horizontally Partitioned Cache Architectures Aviral Shrivastava, Ilya Issenin, Nikil Dutt Center For Embedded.

Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.

Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.

Compiler Optimization-Space Exploration Adrian Pop IDA/PELAB Authors Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, David.

CISC673 – Optimizing Compilers1/34 Presented by: Sameer Kulkarni Dept of Computer & Information Sciences University of Delaware Phase Ordering.

Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.

Apr 14,2003CPE 631 Project Performance Analysis and Power Estimation of ARM Processor Team: Ajayshanker Krishnamurthy Swathi Tanjore Gurumani Zexin Pan.

SAGE: Self-Tuning Approximation for Graphics Engines

Myopic Policies for Budgeted Optimization with Constrained Experiments Javad Azimi, Xiaoli Fern, Alan Fern Oregon State University AAAI, July

CISC Machine Learning for Solving Systems Problems Arch Explorer Lecture 5 John Cavazos Dept of Computer & Information Sciences University of Delaware.

Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.

Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.

Meta Optimization Improving Compiler Heuristics with Machine Learning Mark Stephenson, Una-May O’Reilly, Martin Martin, and Saman Amarasinghe MIT Computer.

Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.

1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

CISC Machine Learning for Solving Systems Problems Presented by: Alparslan SARI Dept of Computer & Information Sciences University of Delaware

Compiling Several Classes of Communication Patterns on a Multithreaded Architecture Gagan Agrawal Department of Computer and Information Sciences Ohio.

A Single-Pass Cache Simulation Methodology for Two-level Unified Caches + Also affiliated with NSF Center for High-Performance Reconfigurable Computing.

Immune Genetic Algorithms for Optimization of Task Priorities and FlexRay Frame Identifiers Soheil Samii 1, Yanfei Yin 1,2, Zebo Peng 1, Petru Eles 1,

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.

CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware

CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.

Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki.

A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)

Embedded System Lab. 오명훈 Addressing Shared Resource Contention in Multicore Processors via Scheduling.

Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.

CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.

CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Intelligent Compilation John Cavazos Computer & Information Sciences Department.

Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.

Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.

Compacting ARM binaries with the Diablo framework – Dominique Chanet & Ludo Van Put Compacting ARM binaries with the Diablo framework Dominique Chanet.

On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.

Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On- demand Le Xu ∗, Boyang Peng†, Indranil Gupta ∗ ∗ Department of Computer Science,

CSE 340 Computer Architecture Summer 2016 Understanding Performance.

Automatic Feature Generation for Machine Learning Based Optimizing Compilation Hugh Leather, Edwin Bonilla, Michael O'Boyle Institute for Computing Systems.

MILEPOST Machine learning in compilers: The Future of Optimisation Hugh Leather University of Edinburgh.

CSCI206 - Computer Organization & Programming

Application-Specific Customization of Soft Processor Microarchitecture

Improving Compiler Heuristics with Machine Learning

بسم الله الرحمن الرحيم.

Presented by: Sameer Kulkarni

Improving Compiler Heuristics with Machine Learning

CSCI206 - Computer Organization & Programming

Communication and Memory Efficient Parallel Decision Tree Construction

Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke

Stephen Hines, David Whalley and Gary Tyson Computer Science Dept.

Reducing Training Time in a One-shot Machine Learning-based Compiler

Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle

Intent-Aware Semantic Query Annotation

10701 / Machine Learning Today: - Cross validation,

Visualizing and Understanding Convolutional Networks

Application-Specific Customization of Soft Processor Microarchitecture

Gagandeep Singh, Juan Gomez-Luna, Giovanni Mariani, Geraldo F

Presentation transcript:

Presented by: Divya Muppaneni Portable Compiler Optimisation Across Embedded Programs and Micro-architectures using Machine Learning Christophe Dubach Grigori Fursin Michael F.P. O’Boyle Timothy M.Jones INRIA Saclay University of Edwin V. Bonilla Edinburgh University of Edinburgh Presented by: Divya Muppaneni Dept of Computer & Information Sciences University of Delaware

Motivation Compiler Optimization It is the process of tuning the output of a compiler to minimize or maximize some attribute of an executable computer program. Difficulties in building an optimizing compiler

Portable Compiler Addressing the problem Developing a portable optimizing compiler Approach Machine Learning

Model Generating the Training data

Model(Contd) Building the Model To learn the model we need to fit a probability distribution over good optimization passes to each training program/micro-architecture. Input Output Arch Desc M1 Perf Cntr P1 Prob.Dist for Opts

Experimental Setup Benchmark MiBench Microarchitecture Space 35 MiBench programs Microarchitecture Space XScale processor 200 micro-architectural configurations Compiler Optimisation Space 1000 different optimizations

Characterising the compiler space Distribution of the maximum speedup available across all microarchitectures on a per –program basis.

Evaluation Methodology Cross Validation Leave-one-out cross validation Best Performance Achievable

Evaluation Methodology Program/Microarchitecture Optimisation Space

Evaluation Methodology Evaluation Across Programs

Evaluation Methodology Evaluation Across Microarchitectures

Results Program Impact on Optimisations

Results(Contd) Microarchitecture Impact on Optimisations

Results(Contd) Extending the Microarchitectural Space

Conclusion Conclusion Future Work Reduce the training cost. Average speedup of 1.16X over the highest default optimization level across the 200 micro-architectural configurations was achieved. Future Work Reduce the training cost.

THANK YOU