Models for runtime optimization Free Breakout Session Jens, Thomas, Alex, Christoph.

Slides:



Advertisements
Similar presentations
Computational Intelligence Winter Term 2009/10 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Introduction to Optimization Anjela Govan North Carolina State University SAMSI NDHS Undergraduate workshop 2006.
Nelder Mead.
Our approach! 6.9% Perfect L2 cache (hit rate 100% ) 1MB L2 cache Cholesky 47% speedup BASE: All cores are used to execute the application-threads. PB-GS(PB-LS)
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Engineering Optimization
Parameter Control A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Chapter 8.
*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
U NIVERSITY OF M ASSACHUSETTS – Department of Computer Science Emery Berger Scalable Memory Management for Multithreaded Applications CMPSCI 691P Fall.
Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.
Optimization Methods One-Dimensional Unconstrained Optimization
Programming with CUDA WS 08/09 Lecture 9 Thu, 20 Nov, 2008.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
FLANN Fast Library for Approximate Nearest Neighbors
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.
Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
A User-Lever Concurrency Manager Hongsheng Lu & Kai Xiao.
RAM, PRAM, and LogP models
Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Learning by Simulating Evolution Artificial Intelligence CSMC February 21, 2002.
 Genetic Algorithms  A class of evolutionary algorithms  Efficiently solves optimization tasks  Potential Applications in many fields  Challenges.
Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Apan.
Fortress Aaron Becker Abhinav Bhatele Hassan Jafri 2 May 2006.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Particle Swarm Optimization † Spencer Vogel † This presentation contains cheesy graphics and animations and they will be awesome.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Programming with CUDA WS 08/09 Lecture 10 Tue, 25 Nov, 2008.
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
Martin Kruliš by Martin Kruliš (v1.1)1.
Solution approaches to the marker layout problem. Kath Dowsland Gower Optimal Algorithms Ltd.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
CS 732: Advance Machine Learning
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
1 Introduction Optimization: Produce best quality of life with the available resources Engineering design optimization: Find the best system that satisfies.
Concurrency and Performance Based on slides by Henri Casanova.
Instructional Design Document Simplex Method - Optimization STAM Interactive Solutions.
Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Tools and Libraries for Manycore Computing Kathy Yelick U.C. Berkeley and LBNL.
Constraints Satisfaction Edmondo Trentin, DIISM. Constraint Satisfaction Problems: Local Search In many optimization problems, the path to the goal is.
Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming Qiumin Xu*, Hyeran Jeon ✝, Keunsoo Kim ❖, Won.
Memory Management What if pgm mem > main mem ?. Memory Management What if pgm mem > main mem ? Overlays – program controlled.
OPERATING SYSTEMS CS 3502 Fall 2017
COMPUTATIONAL MODELS.
Breakout Session 3 Alex, Mirco, Vojtech, Juraj, Christoph
A Methodology for System-on-a-Programmable-Chip Resources Utilization
Parallel and Distributed Simulation Techniques
Computer Engg, IIT(BHU)
Task Scheduling for Multicore CPUs and NUMA Systems
Parallel Algorithm Design
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
CARP: Compression-Aware Replacement Policies
Architectural-Level Synthesis
Professor Ioana Banicescu CSE 8843
Heuristics Local Search
Gandiva: Introspective Cluster Scheduling for Deep Learning
Adaptive Data Refinement for Parallel Dynamic Programming Applications
Why Events Are a Bad Idea (for high concurrency servers)
Mini-Max search Alpha-Beta pruning General concerns on games
First Exam 18/10/2010.
COMPUTER ORGANIZATION AND ARCHITECTURE
Chapter 4 . Trajectory planning and Inverse kinematics
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
Area Coverage Problem Optimization by (local) Search
Direct Methods.
Presentation transcript:

Models for runtime optimization Free Breakout Session Jens, Thomas, Alex, Christoph

Goal Models for runtime optimization Idea: Utilize model at startup to limit search space

Model What is the model? – A prediction function like f(P1, P2, …, Pn) = ExecutionTime – P1 could be the number of cores – P2 could be memory footprint Application specific performance curves for parameter SoPeCo to derive these performance curves – New Strategy? Identify most relevant parameter

Imaginary Idea Professorial Anti Pattern: Alice in Wonderland Divide task in atomic blocks Thread Pool – Optimization through number of cores: Rule of thumb: 2* #threads Ease task for auto-tuner with profile e.g. application is data bound Only Resource Parameters? Limit degrees of freedom (through special programming language and compiler) IMAGINATION ;)

Predict Concurrency Which point in each parameter space is optimal w.r.t. performance of both applications Use only half of each parameter space – E.g. just half the number of cores Palladio Approach: Model for the whole system not just the single application – Predict concurrency through simulation (But not feasible for on-the-fly)

Parameter Variation Variation of parameter with biggest gradient – E.g. Number of Threads Performance depends on utilization of hardware – Idea: Each parameter is more or less important for performance – But: Conflicts by influencing other parameters – Solution: Analysis of sensitivity of each parameter – Which parameter influences the utilization of a resource most?

Sensitivity/Relevance Analysis Define e.g. min, max of possible values Use methods (e.g. Plackett-Burman, Nelder- Mead) to vary (intelligent) values in-between bounds (parameter space) to derive order Result: Most important parameters identified Refine model to decrease error with specific parameter values Goal: Avoid to evaluate whole parameter space

Does it pay off to the prediction? Roofline model? [Williams and Patterson] Problem with abstract measurements Split computation task into sub task – Will split resource demand which in consequence increases parallelism and leads to speedup (hopefully)

Random measurement Problem: – Non steady functions? Simplex – Problem: local maxima Global search methods – Not feasible