Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.

Slides:

Advertisements

Similar presentations

Capacity Planning in a Virtual Environment

Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing

Joseph G Pigeon Villanova University

Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications Piyush Shivam, Shivnath Babu, Jeffrey Chase Duke University.

Introduction CSCI 444/544 Operating Systems Fall 2008.

Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Communication Pattern Based Node Selection for Shared Networks

Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.

OnCall: Defeating Spikes with Dynamic Application Clusters Keith Coleman and James Norris Stanford University June 3, 2003.

Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam

Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.

OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.

Resource Management in Data-Intensive Systems Bernie Acs, Magda Balazinska, John Ford, Karthik Kambatla, Alex Labrinidis, Carlos Maltzahn, Rami Melhem,

A Hadoop MapReduce Performance Prediction Method

Adaptive Server Farms for the Data Center Contact: Ron Sheen Fujitsu Siemens Computers, Inc Sever Blade Summit, Getting the.

Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.

University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.

Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.

How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer

Profiling, What-if Analysis and Cost- based Optimization of MapReduce Programs Oct 7 th 2013 Database Lab. Wonseok Choi.

Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.

Bug Localization with Machine Learning Techniques Wujie Zheng

Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.

Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.

© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Profiling and Modeling Resource Usage.

Automated Problem Diagnosis for Production Systems Soila P. Kavulya Scott Daniels (AT&T), Kaustubh Joshi (AT&T), Matti Hiltunen (AT&T), Rajeev Gandhi (CMU),

Predictive Design Space Exploration Using Genetically Programmed Response Surfaces Henry Cook Department of Electrical Engineering and Computer Science.

CPU Scheduling CSCI 444/544 Operating Systems Fall 2008.

BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.

Automated Control in Cloud Computing: Challenges and Opportunities Harold C. Lim, Shivnath Babu, Jeffrey S. Chase, and Sujay S. Parekh ACM’s First Workshop.

For ABA Importance of Individual Subjects Enables applied behavior analysts to discover and refine effective interventions for socially significant behaviors.

Copyright  2003 by Dr. Gallimore, Wright State University Department of Biomedical, Industrial Engineering & Human Factors Engineering Human Factors Research.

Resource Predictors in HEP Applications John Huth, Harvard Sebastian Grinstein, Harvard Peter Hurst, Harvard Jennifer M. Schopf, ANL/NeSC.

2.5 Scheduling Given a multiprogramming system. Given a multiprogramming system. Many times when more than 1 process is waiting for the CPU (in the ready.

Performance evaluation on grid Zsolt Németh MTA SZTAKI Computer and Automation Research Institute.

Methodologies for Performance Simulation of Super-scalar OOO processors Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project.

Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.

Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.

On the Role of Dataset Complexity in Case-Based Reasoning Derek Bridge UCC Ireland (based on work done with Lisa Cummins)

Learning Application Models for Utility Resource Planning Piyush Shivam, Shivnath Babu, Jeff Chase Duke University IEEE International Conference on Autonomic.

Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.

Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Abel Carrión Ignacio Blanquer Vicente Hernández.

2.5 Scheduling. Given a multiprogramming system, there are many times when more than 1 process is waiting for the CPU (in the ready queue). Given a multiprogramming.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Dynamic Placement of Virtual Machines for Managing SLA Violations NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY SOME SLIDE CONTENT ADAPTED FROM ALEXANDER.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

Sunpyo Hong, Hyesoon Kim

Quality Is in the Eye of the Beholder: Meeting Users ’ Requirements for Internet Quality of Service Anna Bouch, Allan Kuchinsky, Nina Bhatti HP Labs Technical.

Maximizing Performance – Why is the disk subsystem crucial to console performance and what’s the best disk configuration. Extending Performance – How.

Assess usability of a Web site’s information architecture: Approximate people’s information-seeking behavior (Monte Carlo simulation) Output quantitative.

Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Feature Generation and Selection in SRL Alexandrin Popescul & Lyle H. Ungar Presented By Stef Schoenmackers.

CSE 340 Computer Architecture Summer 2016 Understanding Performance.

Resource Characterization Rich Wolski, Dan Nurmi, and John Brevik Computer Science Department University of California, Santa Barbara VGrADS Site Visit.

OPERATING SYSTEMS CS 3502 Fall 2017

Jacob R. Lorch Microsoft Research

Kyriaki Dimitriadou, Brandeis University

Grid Computing.

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Some challenges in heterogeneous multi-core systems

Predictive Performance

Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle

September 2017.

Emulator of Cosmological Simulation for Initial Parameters Study

A Data Partitioning Scheme for Spatial Regression

Rohan Yadav and Charles Yuan (rohany) (chenhuiy)

Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project

Presentation transcript:

Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University

C3C3 C1C1 C2C2 Site A Site B Site C Task scheduler Task workflow A network of clusters or grid sites. Each site is a pool of heterogeneous resources (e.g., CPU, memory, storage, network) Managed as a shared utility. Jobs are task/data workflows. Challenge: choose the ‘best’ resource mapping/schedule for the job mix. Instance of “utility resource planning”. Solution under construction: NIMO Networked Computing Utility

Subproblem: Predict Job Completion Time Attributes Samples CPU speed Memory size Network latency Disk spindlesExecution time s1s1 2.4 GHz2 GB1 ms102 hours

Premises (Limitations) Important batch applications are run repeatedly. –Most resources are consumed by applications we have seen in the past. Behavior is predictable across data sets. –…given some attributes associated with the data set. –Stable behavior per unit of data processed (D) –D is predictable from data set attributes. Behavior depends only on resource attributes. –CPU type and clock, seek time, spindle count. Utility controls the resources assigned to each job. –Virtualization enables precise control. Your mileage may vary.

NIMO NonInvasive Modeling for Optimization NIMO learns end-to-end performance models –Models predict performance as a function of, (a) application profile, (b) data set profile, and (c) resource profile of candidate resource assignment NIMO is active –NIMO collects training data for learning models by conducting proactive experiments on a ‘workbench’ NIMO is noninvasive App/data profiles (Target) performance Candidate resource profiles Model “What if…”

Application profiler Training set database Active learning C3C3 C1C1 C2C2 Site A Site B Site C Scheduler Resource profiler The Big Picture Jobs, benchmarks Pervasive instrumentation Correlate metrics with job logs

Generic End-to-End Model compute phases (compute resource busy) stall phases (compute resource stalled on I/O) O d (storage occupancy) O n (network occupancy) ++ ) ( T=D * total data comp. time O a (compute occupancy) O s (stall occupancy) occupancy: average time consumed per unit of data directly observable

Independent variables Dependent variables Resource profile ( ) Data profile ( ) Statistical Learning Complexity (e.g., latency hiding, concurrency, arm contention) is captured implicitly in the training data rather than in the structure of the model.

Sampling Challenges Full system operating range –Samples must cover space of candidate resource assignments Cost of sample acquisition –Acquiring a sample has a non-negligible cost, e.g., time to acquire a sample, or opportunity cost for the application Curse of dimensionality –Too many parameters! –E.g., 10 dimensions X 10 values per dimension –5 minutes for each sample => 951 years for 1% samples!

Active Learning in NIMO Passive sampling Active sampling Number of training samples Accuracy of current model 100% Passive sampling might not expose the system operating range Active sampling using “design of experiments” collects most relevant training data Automatic and quick How to learn accurate models quickly?

Sample Carefully Passive sampling Active sampling with acceleration Number of training samples Accuracy of current model 100% Active sampling without acceleration

Active Sampling Challenges How to expose the main factors and interactions in the shortest time? –Which dimensions/attributes to perturb? –What values to choose for the attributes? Where to conduct the experiment? –On a separate system (“workbench”) or “live”?

Planning `active’ experiments 1.Choose a predictor function to refine Focus in on the most significant/relevant predictors….or…the least accurate Example: CPU-intensive app needs an accurate compute time predictor 2.Choose attribute (if any) to add to the predictor Example: CPU speed 3.Choose the values of the attributes 4.Conduct the experiment 5.Compute current prediction error; Go to Step 1

Choosing the Next Predictor Learn the most significant/relevant predictors first. –Static vs. dynamic ordering –Static: define total order, e.g., a priori or by pre- estimates of influence (Plackett-Burman). Cycle through the order: round-robin vs. improvement threshold –Dynamic: choose the predictor with maximum current error

Choosing New Attributes Include the most significant/relevant attributes –Choose attributes to expose main factors and interactions Add an attribute when error reduction from further training with the current set falls below threshold. Choose the attribute with maximum potential improvement in accuracy. –Establish total order using pre-estimate of relevance using Plackett-Burman.

Choosing New Values Select a new value sample to train the selected predictor function with the chosen set of attributes. Range of approaches balance coverage vs. interactions Binary search/bracket PB to identify interactions L a -I b a = #levels for value b = degree of interactions

Experimental Results Biomedical applications –BLAST, fMRI, NAMD, CardioWave Resources –5 CPU speeds, 6 Network latencies, 5 Memory sizes –5 X 6 X 5 = 150 resource assignments Goal: Learn executing time model with least number of training assignments Separate test set to evaluate the accuracy of the current model

BLAST Application Total time for 150 assignments: 130 hrs Active sampling: 5 hrs Sample space: 2% Incorrect order of predictor refinement 12 hrs 10% sample space

BLAST Application Total time for 150 assignments: 130 hrs Active sampling: 5 hrs Sample space: 2% Incorrect order of attribute refinement 12 hrs 10% sample space

Summary/Conclusions Current SLT – given the right data, learn the right model Use active sampling to acquire the right data Ongoing experiments demonstrate the importance/potential of guided active sampling –2% sample space, >= 90% model accuracy Upcoming VLDB paper…