Adapting Convergent Scheduling Using Machine Learning Diego Puppin, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe †

Slides:

Advertisements

Similar presentations

Population-based metaheuristics Nature-inspired Initialize a population A new population of solutions is generated Integrate the new population into the.

Advertisements

CS6800 Advanced Theory of Computation

U P C MICRO36 San Diego December 2003 Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors Enric Gibert 1 Jesús Sánchez 2 Antonio González.

1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.

Using Parallel Genetic Algorithm in a Predictive Job Scheduling

Multi-Objective Optimization NP-Hard Conflicting objectives – Flow shop with both minimum makespan and tardiness objective – TSP problem with minimum distance,

Genetic Algorithms Contents 1. Basic Concepts 2. Algorithm

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Mitigating the Compiler Optimization Phase- Ordering Problem using Machine Learning.

SBSE Course 3. EA applications to SE Analysis Design Implementation Testing Reference: Evolutionary Computing in Search-Based Software Engineering Leo.

Parallelized Evolution System Onur Soysal, Erkin Bahçeci Erol Şahin Dept. of Computer Engineering Middle East Technical University.

Introduction to Genetic Algorithms Yonatan Shichel.

Introduction to Evolutionary Computation  Genetic algorithms are inspired by the biological processes of reproduction and natural selection. Natural selection.

Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.

Genetic Algorithms Learning Machines for knowledge discovery.

Artificial Intelligence Genetic Algorithms and Applications of Genetic Algorithms in Compilers Prasad A. Kulkarni.

CISC673 – Optimizing Compilers1/34 Presented by: Sameer Kulkarni Dept of Computer & Information Sciences University of Delaware Phase Ordering.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

1 Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization.

CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.

SBSE Course 4. Overview: Design Translate requirements into a representation of software Focuses on –Data structures –Architecture –Interfaces –Algorithmic.

Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.

Genetic Algorithms and Ant Colony Optimisation

Efficient Model Selection for Support Vector Machines

A Budget Constrained Scheduling of Workflow Applications on Utility Grids using Genetic Algorithms Jia Yu and Rajkumar Buyya Grid Computing and Distributed.

Using Genetic Programming to Learn Probability Distributions as Mutation Operators with Evolutionary Programming Libin Hong, John Woodward, Ender Ozcan,

University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.

A Comparison of Nature Inspired Intelligent Optimization Methods in Aerial Spray Deposition Management Lei Wu Master’s Thesis Artificial Intelligence Center.

High Performance, Pipelined, FPGA-Based Genetic Algorithm Machine A Review Grayden Smith Ganga Floora 1.

UNIVERSITAT POLITÈCNICA DE CATALUNYA Departament d’Arquitectura de Computadors Exploiting Pseudo-schedules to Guide Data Dependence Graph Partitioning.

Meta Optimization Improving Compiler Heuristics with Machine Learning Mark Stephenson, Una-May O’Reilly, Martin Martin, and Saman Amarasinghe MIT Computer.

Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

Expanding the CASE Framework to Facilitate Load Balancing of Social Network Simulations Amara Keller, Martin Kelly, Aaron Todd.

Optimization Problems - Optimization: In the real world, there are many problems (e.g. Traveling Salesman Problem, Playing Chess ) that have numerous possible.

LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:

Fuzzy Genetic Algorithm

A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao.

C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.

Rinoy Pazhekattu. Introduction  Most IPs today are designed using component-based design  Each component is its own IP that can be switched out for.

1 Genetic Algorithms and Ant Colony Optimisation.

Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.

CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware

Biologically inspired algorithms BY: Andy Garrett YE Ziyu.

Improving Compiler Heuristics with Machine Learning Mark Stephenson Una-May O’Reilly Martin C. Martin Saman Amarasinghe Massachusetts Institute of Technology.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.

Optimization Problems

In the name of ALLAH Presented By : Mohsen Shahriari, the student of communication in Sajad institute for higher education.

Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.

1 Autonomic Computer Systems Evolutionary Computation Pascal Paysan.

1 Contents 1. Basic Concepts 2. Algorithm 3. Practical considerations Genetic Algorithm (GA)

Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.

Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.

1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.

Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.

Genetic Algorithms. Solution Search in Problem Space.

Genetic Algorithms An Evolutionary Approach to Problem Solving.

An Evolutionary Algorithm for Neural Network Learning using Direct Encoding Paul Batchis Department of Computer Science Rutgers University.

Presented By: Farid, Alidoust Vahid, Akbari 18 th May IAUT University – Faculty.

Evolutionary Computation Evolving Neural Network Topologies.

1 Genetic Algorithms Contents 1. Basic Concepts 2. Algorithm 3. Practical considerations.

Improving Compiler Heuristics with Machine Learning

Improving Compiler Heuristics with Machine Learning

Genetic Programming Applied to Compiler Optimization

Packet Classification with Evolvable Hardware Hash Functions

The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’

Dr. Kenneth Stanley February 6, 2006

Boltzmann Machine (BM) (§6.4)

Genetic Programming Applied to Compiler Optimization

Coevolutionary Automated Software Correction

Presentation transcript:

Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † * Institute for Information Science and Technologies, Italy † Massachusetts Institute of Technology, USA

Outline This talk shows how one can apply machine learning techniques to find good phase orderings for an instruction scheduler First, I’ll introduce the scheduler that we are interested in improving Then, I’ll discuss genetic programming Then, I’ll present experimental results

R4000 like Processor Core Operand network Clustered Architectures Memory and registers separated into clusters  RAW  Clustered VLIWs When scheduling, we try to co-locate data with computation

Convergent Scheduling Convergent scheduling passes are symmetric Each pass takes as input a preference map and outputs a preference map Passes are modular and can be applied in any order

Convergent Scheduling Preference Maps Instructions Clusters Time Each entry is a weight The weights correspond to the “confidence” of a space-time assignment for a given instruction

Four clusters High confidence Low confidence Example Dependence Graph

Placement Propagation

Critical Path Strengthening

Path Propagation

Parallelism Distribute

Path Propagation

Communication Reduction

Path Propagation

Final Schedule

Convergent Scheduling “Classical” scheduling passes make absolute decisions that can’t be undone Convergent scheduling passes make soft decisions in the form of preferences  Mistakes made early on can be undone Passes don’t impose order! Pass

Double-Edged Sword The good news: convergent scheduling does not constrain phase order  Nice interface makes writing and integrating passes easy The bad news: convergent scheduling does not constrain phase order  Limitless number of phase orders to consider, some of which are much better than others

Our Proposal Use genetic programming to automatically search for a phase ordering that’s catered to a given  Architecture  Compiler Our inspiration comes from Cooper’s work [Cooper et al., LCTES 1999]

Genetic Programming Searching algorithm analogous to Darwinian evolution  Maintain a population of expressions (sequence INITTIME (sequence PLACE (if imbalanced LOAD COMM)))

Genetic Programming Searching algorithm analogous to Darwinian evolution  Maintain a population of expressions  Selection The fittest expressions in the population are more likely to reproduce  Reproduction Crossing over subexpressions of two expressions  Mutation

General Flow Create initial population (initial solutions) Evaluation Selection Randomly generated initial population Create Variants done?

General Flow Create initial population (initial solutions) Evaluation Selection Create Variants done? Compiler is modified to use the given expression as the phase ordering Each expression is evaluated by compiling and running the benchmark(s) Fitness is the relative speedup over our original phase ordering on the benchmark(s)

General Flow Create initial population (initial solutions) Evaluation Selection Create Variants done? Just as with Natural Selection, the fittest individuals are more likely to survive

General Flow Create initial population (initial solutions) Evaluation Selection Create Variants done? Use crossover and mutation to generate new expressions And thus, generate new and hopefully improved phase orderings

Experimental Setup We use an in-house VLIW compiler (SUIF, MachSUIF) and simulator Compiler and simulator are parameterized so we can easily change VLIW configurations Experiments presented here are for clustered architectures  Details of the architectures are in the paper

Convergent Scheduling Heuristics Noise Introduction Initial Time Assignment Preplacement Critical Path Strengthening Communication Minimization Parallelism Distribution Load Balance Dependence Enforcement Assignment Strengthening Functional Unit Distribution Push to first cluster Critical Path Distance Cluster Creation Register Pressure Reduction in Time Register Pressure Reduction in Space

Hand-Tuned Results 4-cluster VLIW, Rich Interconnect

Results 4-cluster VLIW, Limited Interconnect

Training an Improved Sequence Goal: find a sequence that works well for all the benchmarks in the last graph (vmul, rbsorf, yuv, etc.) Train a sequence using these benchmarks then…  For each expression in the population compile and run all the benchmarks, take the average speedup as fitness

The Schedule Evolved sequence is much more conservative in communication inittime  func  dep  func  load  func  dep  func  comm  dep  func  comm  place func reduces weights of instructions on overloaded clusters dep increases probability that dependent instruction scheduled “nearby” comm tries to keep neighboring instructions in same cluster

Results 4-cluster VLIW, Limited Interconnect

Results Leave-One-Out Cross Validation

Summary of Results When we changed the architecture, the hand-tuned sequence failed  UAS and PCC outperform convergent scheduling Our GP system found a sequence that usually outperforms UAS and PCC Cross validation suggests that it is possible to find a “general-purpose” sequence

Running Time Using about 20 machines in a small cluster of workstations it takes about 2 days to evolve a sequence This is a one-time process!  Performed by the compiler vendor

Disappointing Result Unfortunately, sequences with conditionals are weeded out of the GP selection process  Our system rewards parsimony  Convergent scheduling passes make soft decisions, so running an extra pass may not be detrimental We’d like to get to the bottom of this unexpected result

Conclusions Using GP we’re able to find architecture- specific, application-independent sequences We can quickly retune the compiler when  The architecture changes  The compiler itself changes

Implemented Tests