University of Maryland Towards Automated Tuning of Parallel Programs Jeffrey K. Hollingsworth Department of Computer Science University.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

1 Automating Auto Tuning Jeffrey K. Hollingsworth University of Maryland
Managing Web server performance with AutoTune agents by Y. Diao, J. L. Hellerstein, S. Parekh, J. P. Bigu Jangwon Han Seongwon Park
Introduction to Maven 2.0 An open source build tool for Enterprise Java projects Mahen Goonewardene.
OOPSLA 2005 Workshop on Library-Centric Software Design The Diary of a Datum: An Approach to Modeling Runtime Complexity in Framework-Based Applications.
Christian Delbe1 Christian Delbé OASIS Team INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis November Automatic Fault Tolerance in ProActive.
15 Copyright © 2004, Oracle. All rights reserved. Monitoring and Managing Memory.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
DEVS-Based Simulation Web Services for Net-Centric T&E Saurabh Mittal, Ph.D. Jose L. Risco-Martin*, Ph.D. Bernard P. Zeigler, Ph.D. Arizona Center for.
May 25, 2010 Mary Hall May 25, 2010 Advancing the Compiler Community’s Research Agenda with Archiving and Repeatability * This work has been partially.
Introduction : ‘Skoll: Distributed Continuous Quality Assurance’ Morimichi Nishigaki.
Presented by Deepak Srinivasan Alaa Aladmeldeen, Milo Martin, Carl Mauer, Kevin Moore, Min Xu, Daniel Sorin, Mark Hill and David Wood Computer Sciences.
Accelerating Product and Service Innovation © 2013 IBM Corporation IBM Integrated Solution for System z Development (ISDz) Henk van der Wijk 23 Januari.
Gain Performance & Scalability With RightNow Analytics
An Automated Component-Based Performance Experiment and Modeling Environment Van Bui, Boyana Norris, Lois Curfman McInnes, and Li Li Argonne National Laboratory,
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
Integrating PDC Topics with Software Engineering and Computer Network Courses at Undergraduate Level An Early Adopter Experience EduHPC-2014, New Orleans,
Introduction to the Atlas Platform Mobile & Pervasive Computing Laboratory Department of Computer and Information Sciences and Engineering University of.
Beyond Automatic Performance Analysis Prof. Dr. Michael Gerndt Technische Univeristät München
Automatic Identification of Concurrency in Handel-C Joseph C Libby, Kenneth B Kent, Farnaz Gharibian Faculty of Computer Science University of New Brunswick.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
QCD Project Overview Ying Zhang September 26, 2005.
Cluster Reliability Project ISIS Vanderbilt University.
November 13, 2006 Performance Engineering Research Institute 1 Scientific Discovery through Advanced Computation Performance Engineering.
Transparent Grid Enablement Using Transparent Shaping and GRID superscalar I. Description and Motivation II. Background Information: Transparent Shaping.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
System Software for Parallel Computing. Two System Software Components Hard to do the innovation Replacement for Tradition Optimizing Compilers Replacement.
A Framework to Test Autonomic Containers Brittany Parsons and Ronald Stevens July 6, 2006 REU Sponsored by NSF.
MAPLD Reconfigurable Computing Birds-of-a-Feather Programming Tools Jeffrey S. Vetter M. C. Smith, P. C. Roth O. O. Storaasli, S. R. Alam
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Statistical Analysis of Inlining Heuristics in Jikes RVM Jing Yang Department of Computer Science, University of Virginia.
1CPSD Software Infrastructure for Application Development Laxmikant Kale David Padua Computer Science Department.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
George Tsouloupas University of Cyprus Task 2.3 GridBench ● 1 st Year Targets ● Background ● Prototype ● Problems and Issues ● What's Next.
Making Watson Fast Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully – All computations have to be done in a few seconds –
A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.
Scientific Programmes Committee Centre for Aerospace Systems Design & Engineering Amitay Isaacs Department of Aerospace Engineering Indian Institute of.
Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.
Creating SmartArt 1.Create a slide and select Insert > SmartArt. 2.Choose a SmartArt design and type your text. (Choose any format to start. You can change.
Infrastructure as code. “Enable the reconstruction of the business from nothing but a source code repository, an application data backup, and bare metal.
Managing Web Server Performance with AutoTune Agents by Y. Diao, J. L. Hellerstein, S. Parekh, J. P. Bigus Presented by Changha Lee.
1 University of Maryland Runtime Program Evolution Jeff Hollingsworth © Copyright 2000, Jeffrey K. Hollingsworth, All Rights Reserved. University of Maryland.
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
A WEB-ENABLED APPROACH FOR GENERATING DATA PROCESSORS University of Nevada Reno Department of Computer Science & Engineering Jigar Patel Sergiu M. Dascalu.
E-COMMERCE & MOBILE COMPUTING. On Technicals… Considerations for evaluating platform Ecommerce Applications Development Process Integration Options Middlewares.
Benchmarking and Applications. Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
1/30/2003 Los Alamos National Laboratory1 A Migration Framework for Legacy Scientific Applications  Current tendency: monolithic architectures large,
HPHC - PERFORMANCE TESTING Dec 15, 2015 Natarajan Mahalingam.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
A few words on locality and arrays
Simon Rabone IBM Automation with Watson for period end closure in a global electronics company Global Oracle AD&M Offering Leader IBM Global Business Services.
A Web-enabled Approach for generating data processors
CE-105 Spring 2007 Engr. Faisal ur Rehman
EIN 6133 Enterprise Engineering
Three Questions To Ask About Clusters
Presented by: Sameer Kulkarni
ICS-2018 June 12-15, Beijing Zwift : A Programming Framework for High Performance Text Analytics on Compressed Data Feng Zhang †⋄, Jidong Zhai ⋄, Xipeng.
Research Challenges of Autonomic Computing
Introduction to Apache
TensorFlow: A System for Large-Scale Machine Learning
Convergence of Big Data and Extreme Computing
Derivatives and Gradients
Presentation transcript:

University of Maryland Towards Automated Tuning of Parallel Programs Jeffrey K. Hollingsworth Department of Computer Science University of Maryland, College Park, MD 20742

University of Maryland Why Automated Performance Tuning? Software commonly have parameters that impact their performance. Optimal parameter values are usually variable and un-predictable. Automated Parameter tuning can be used for adaptive parameter tuning in complex software.

University of Maryland Automated Performance Tuning Goal: Maximize achieved performance Problems: –Large number of parameters to tune –Shape of objective function unknown –Multiple libraries and coupled applications –Analytical model may not be available Requirements: –Runtime tuning for long running programs –Don ’ t try too many configurations –Avoid gradients

University of Maryland Active Harmony Runtime performance optimization –Can also support training runs Automatic library selection (code) –Monitor library performance –Switch library if necessary Automatic performance tuning (parameter) –Monitor system performance –Adjust runtime parameters Hooks for Compiler Frameworks –Working to integrate USC/ISI Chill –Looking at others too

University of Maryland Example: Cluster Based Web Server 3-tier system Harmony Provides –Parameter updates for DB, and App Severs TPC-W Benchmark –Transactional web benchmark –Mimic operations of an e-commerce site –Uses Java implementation from Univ. of Wisconsin –Performance metrics Web Interaction Per Second (WIPS)

University of Maryland Cluster-Based Web Service Tuning Best configuration after 200 iterations BrowsingShoppingOrdering Improvements compared to the default configuration 15% 16% 5%

University of Maryland Importance of various parameters

University of Maryland A Bit More About Harmony Search Pre-execution –Sensitivity Discovery Phase –Used to Order not Eliminate search dimensions Online –Use Parallel Rank Order Search Different configurations on different nodes

University of Maryland Benefits of Searching in Parallel Performance for Two Programs –High Performance Linpack –POP Ocean Code

University of Maryland 10 Integrating Compiler Transformations

University of Maryland 11 Performance of Matrix Multiply ActiveHarmony + CHiLL vs ATLAS and MKL

University of Maryland Must Coordinate Auto Tuners Problem: Warring auto tuning systems Multiple components “ auto tuning ” at once Tuning based on multiple changes at once Solution: –Need some level of coordination –Possible Answer: Exposing different tuning systems –Part of PERI Auto-tuning Effort

University of Maryland Conclusion Active Harmony –An infrastructure for runtime tuning –Automatic tuning and environment adaptation –It works ! Auto Tuning Integration –Need to integrate multiple tools Compilers, runtime, application

University of Maryland Acknowledgements Coding and Experiments –I-Hsin Chung (IBM Watson) –Vahid Tabatabaee –Ananta Tiwari Chill Integration Effort –Marry Hall (Utah) –Chun Chen (USC/ISI) –Jacqueline Chame (USC/ISI) Funding –DOE – PERC/PERI –NSF –LTS (DoD)